Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Pedro Sousa
Hi,

that could be a problem with neutron metadata service, check the logs.

Have you considered that the outage might have corrupted your databases,
neutron, nova, etc?

BR

On Thu, Jul 5, 2018 at 9:07 PM Torin Woltjer 
wrote:

> Are IP addresses set by cloud-init on boot? I noticed that cloud-init
> isn't working on my VMs. created a new instance from an ubuntu 18.04 image
> to test with, the hostname was not set to the name of the instance and
> could not login as users I had specified in the configuration.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> *www.granddial.com  *
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 12:57 PM
> *To*: torin.wolt...@granddial.com
> *Cc*: "openst...@lists.openstack.org" , "
> openstack-operators@lists.openstack.org" <
> openstack-operators@lists.openstack.org>
> *Subject*: Re: [Openstack] Recovering from full outage
> You should tcpdump inside the qdhcp namespace to see if the requests make
> it there, and also check iptables rules on the compute nodes for the return
> traffic.
>
>
> On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
>> eventually come up with no addresses. neutron-dhcp-agent has been restarted
>> on both controllers. The qdhcp netns's were all present; I stopped the
>> service, removed the qdhcp netns's, noted the dhcp agents show offline by
>> `neutron agent-list`, restarted all neutron services, noted the qdhcp
>> netns's were recreated, restarted a VM again and it still fails to pull an
>> IP address.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  www.granddial.com
>>  *
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/5/18 10:38 AM
>> *To*: torin.wolt...@granddial.com
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>>
>> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
>> torin.wolt...@granddial.com> wrote:
>>
>>> The qrouter netns appears once the lock_path is specified, the neutron
>>> router is pingable as well. However, instances are not pingable. If I log
>>> in via console, the instances have not been given IP addresses, if I
>>> manually give them an address and route they are pingable and seem to work.
>>> So the router is working correctly but dhcp is not working.
>>>
>>> No errors in any of the neutron or nova logs on controllers or compute
>>> nodes.
>>>
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  
>>>  www.granddial.com
>>>  *
>>>
>>> --
>>> *From*: "Torin Woltjer" 
>>> *Sent*: 7/5/18 8:53 AM
>>> *To*: 
>>> *Cc*: openstack-operators@lists.openstack.org,
>>> openst...@lists.openstack.org
>>> *Subject*: Re: [Openstack] Recovering from full outage
>>> There is no lock path set in my neutron configuration. Does it
>>> ultimately matter what it is set to as long as it is consistent? Does it
>>> need to be set on compute nodes as well as controllers?
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  
>>>  
>>>  www.granddial.com
>>>  *
>>>
>>> --
>>> *From*: George Mihaiescu 
>>> *Sent*: 7/3/18 7:47 PM
>>> *To*: torin.wolt...@granddial.com
>>> *Cc*: openstack-operators@lists.openstack.org,
>>> openst...@lists.openstack.org
>>> *Subject*: Re: [Openstack] Recovering from full outage
>>>
>>> Did you set a lock_path in the neutron’s config?
>>>
>>> On Jul 3, 2018, at 17:34, Torin Woltjer 
>>> wrote:
>>>
>>> The following errors appear in the neutron-linuxbridge-agent.log on both
>>> controllers: 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> http://paste.openstack.org/show/724930/
>>>
>>> No such errors are on the compute nodes themselves.
>>>
>>> *Torin Woltjer*
>>>
>>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>>
>>> *616.776.1066 ext. 2006*
>>> *  

[Openstack-operators] Storage concerns when switching from a single controller to a HA setup

2018-07-05 Thread Jean-Philippe Méthot
Hi,

We’ve been running on Openstack for several years now and our setup has always 
counted a single controller. We are currently testing switching to a dual 
controller HA solution, but an unexpected issue has appeared, regarding 
storage. See, we use Dell compellent SAN for our block devices. I notice that 
when I create a volume on one controller, I am unable to make any operation on 
the same volume on the second controller (this is with an active/passive 
cinder-volume). Worse, this affects VMs directly as they can’t be migrated if 
the active controller isn’t the one that created their block device.

I know this issue doesn’t happen on Ceph, so I’ve been wondering, is this a 
limitation of Openstack or the SAN driver? Also, is there actually a way to 
reach even active-passive high availability with this current storage solution?


Jean-Philippe Méthot
Openstack system administrator
Administrateur système Openstack
PlanetHoster inc.




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [Openstack] [nova][api] Novaclient redirect endpoint https into http

2018-07-05 Thread Monty Taylor

On 07/05/2018 01:55 PM, melanie witt wrote:

+openstack-dev@

On Wed, 4 Jul 2018 14:50:26 +, Bogdan Katynski wrote:
But, I can not use nova command, endpoint nova have been redirected 
from https to http. Here:http://prntscr.com/k2e8s6  (command: nova 
–insecure service list)
First of all, it seems that the nova client is hitting /v2.1 instead 
of /v2.1/ URI and this seems to be triggering the redirect.


Since openstack CLI works, I presume it must be using the correct URL 
and hence it’s not getting redirected.


And this is error log: Unable to establish connection 
tohttp://192.168.30.70:8774/v2.1/: ('Connection aborted.', 
BadStatusLine("''",))
Looks to me that nova-api does a redirect to an absolute URL. I 
suspect SSL is terminated on the HAProxy and nova-api itself is 
configured without SSL so it redirects to an http URL.


In my opinion, nova would be more load-balancer friendly if it used a 
relative URI in the redirect but that’s outside of the scope of this 
question and since I don’t know the context behind choosing the 
absolute URL, I could be wrong on that.


Thanks for mentioning this. We do have a bug open in python-novaclient 
around a similar issue [1]. I've added comments based on this thread and 
will consult with the API subteam to see if there's something we can do 
about this in nova-api.


A similar thing came up the other day related to keystone and version 
discovery. Version discovery documents tend to return full urls - even 
though relative urls would make public/internal API endpoints work 
better. (also, sometimes people don't configure things properly and the 
version discovery url winds up being incorrect)


In shade/sdk - we actually construct a wholly-new discovery url based on 
the url used for the catalog and the url in the discovery document since 
we've learned that the version discovery urls are frequently broken.


This is problematic because SOMETIMES people have public urls deployed 
as a sub-url and internal urls deployed on a port - so you have:


Catalog:
public: https://example.com/compute
internal: https://compute.example.com:1234

Version discovery:
https://example.com/compute/v2.1

When we go to combine the catalog url and the versioned url, if the user 
is hitting internal, we product 
https://compute.example.com:1234/compute/v2.1 - because we have no way 
of systemically knowing that /compute should also be stripped.


VERY LONG WINDED WAY of saying 2 things:

a) Relative URLs would be *way* friendlier (and incidentally are 
supported by keystoneauth, openstacksdk and shade - and are written up 
as being a thing people *should* support in the documents about API 
consumption)


b) Can we get agreement that changing behavior to return or redirect to 
a relative URL would not be considered an api contract break? (it's 
possible the answer to this is 'no' - so it's a real question)


Monty

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
Are IP addresses set by cloud-init on boot? I noticed that cloud-init isn't 
working on my VMs. created a new instance from an ubuntu 18.04 image to test 
with, the hostname was not set to the name of the instance and could not login 
as users I had specified in the configuration.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 12:57 PM
To: torin.wolt...@granddial.com
Cc: "openst...@lists.openstack.org" , 
"openstack-operators@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
You should tcpdump inside the qdhcp namespace to see if the requests make it 
there, and also check iptables rules on the compute nodes for the return 
traffic.

On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer  
wrote:
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down.

Re: [Openstack-operators] [Openstack] [nova][api] Novaclient redirect endpoint https into http

2018-07-05 Thread melanie witt

+openstack-dev@

On Wed, 4 Jul 2018 14:50:26 +, Bogdan Katynski wrote:

But, I can not use nova command, endpoint nova have been redirected from https 
to http. Here:http://prntscr.com/k2e8s6  (command: nova –insecure service list)

First of all, it seems that the nova client is hitting /v2.1 instead of /v2.1/ 
URI and this seems to be triggering the redirect.

Since openstack CLI works, I presume it must be using the correct URL and hence 
it’s not getting redirected.

  
And this is error log: Unable to establish connection tohttp://192.168.30.70:8774/v2.1/: ('Connection aborted.', BadStatusLine("''",))
  

Looks to me that nova-api does a redirect to an absolute URL. I suspect SSL is 
terminated on the HAProxy and nova-api itself is configured without SSL so it 
redirects to an http URL.

In my opinion, nova would be more load-balancer friendly if it used a relative 
URI in the redirect but that’s outside of the scope of this question and since 
I don’t know the context behind choosing the absolute URL, I could be wrong on 
that.


Thanks for mentioning this. We do have a bug open in python-novaclient 
around a similar issue [1]. I've added comments based on this thread and 
will consult with the API subteam to see if there's something we can do 
about this in nova-api.


-melanie

[1] https://bugs.launchpad.net/python-novaclient/+bug/1776928




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread George Mihaiescu
You should tcpdump inside the qdhcp namespace to see if the requests make
it there, and also check iptables rules on the compute nodes for the return
traffic.


On Thu, Jul 5, 2018 at 12:39 PM, Torin Woltjer 
wrote:

> Yes, I've done this. The VMs hang for awhile waiting for DHCP and
> eventually come up with no addresses. neutron-dhcp-agent has been restarted
> on both controllers. The qdhcp netns's were all present; I stopped the
> service, removed the qdhcp netns's, noted the dhcp agents show offline by
> `neutron agent-list`, restarted all neutron services, noted the qdhcp
> netns's were recreated, restarted a VM again and it still fails to pull an
> IP address.
>
> *Torin Woltjer*
>
> *Grand Dial Communications - A ZK Tech Inc. Company*
>
> *616.776.1066 ext. 2006*
> * www.granddial.com *
>
> --
> *From*: George Mihaiescu 
> *Sent*: 7/5/18 10:38 AM
> *To*: torin.wolt...@granddial.com
> *Subject*: Re: [Openstack] Recovering from full outage
> Did you restart the neutron-dhcp-agent  and rebooted the VMs?
>
> On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer <
> torin.wolt...@granddial.com> wrote:
>
>> The qrouter netns appears once the lock_path is specified, the neutron
>> router is pingable as well. However, instances are not pingable. If I log
>> in via console, the instances have not been given IP addresses, if I
>> manually give them an address and route they are pingable and seem to work.
>> So the router is working correctly but dhcp is not working.
>>
>> No errors in any of the neutron or nova logs on controllers or compute
>> nodes.
>>
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  
>> www.granddial.com *
>>
>> --
>> *From*: "Torin Woltjer" 
>> *Sent*: 7/5/18 8:53 AM
>> *To*: 
>> *Cc*: openstack-operators@lists.openstack.org,
>> openst...@lists.openstack.org
>> *Subject*: Re: [Openstack] Recovering from full outage
>> There is no lock path set in my neutron configuration. Does it ultimately
>> matter what it is set to as long as it is consistent? Does it need to be
>> set on compute nodes as well as controllers?
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  
>>  
>> www.granddial.com *
>>
>> --
>> *From*: George Mihaiescu 
>> *Sent*: 7/3/18 7:47 PM
>> *To*: torin.wolt...@granddial.com
>> *Cc*: openstack-operators@lists.openstack.org,
>> openst...@lists.openstack.org
>> *Subject*: Re: [Openstack] Recovering from full outage
>>
>> Did you set a lock_path in the neutron’s config?
>>
>> On Jul 3, 2018, at 17:34, Torin Woltjer 
>> wrote:
>>
>> The following errors appear in the neutron-linuxbridge-agent.log on both
>> controllers: 
>> 
>> 
>> 
>> 
>> http://paste.openstack.org/sho
>> w/724930/
>>
>> No such errors are on the compute nodes themselves.
>>
>> *Torin Woltjer*
>>
>> *Grand Dial Communications - A ZK Tech Inc. Company*
>>
>> *616.776.1066 ext. 2006*
>> *  
>>  
>>  
>> www.granddial.com *
>>
>> --
>> *From*: "Torin Woltjer" 
>> *Sent*: 7/3/18 5:14 PM
>> *To*: 
>> *Cc*: "openstack-operators@lists.openstack.org" <
>> openstack-operators@lists.openstack.org>, "openst...@lists.openstack.org"
>> 
>> *Subject*: Re: [Openstack] Recovering from full outage
>> Running `openstack server reboot` on an instance just causes the instance
>> to be stuck in a rebooting status. Most notable of the logs is
>> neutron-server.log which shows the following:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> http://paste.openstack.org/sho
>> w/724917/
>>
>> I realized that rabbitmq was in a failed state, so I bootstrapped it,
>> rebooted controllers, and all of the agents show online.
>> 
>> 
>> 

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
Yes, I've done this. The VMs hang for awhile waiting for DHCP and eventually 
come up with no addresses. neutron-dhcp-agent has been restarted on both 
controllers. The qdhcp netns's were all present; I stopped the service, removed 
the qdhcp netns's, noted the dhcp agents show offline by `neutron agent-list`, 
restarted all neutron services, noted the qdhcp netns's were recreated, 
restarted a VM again and it still fails to pull an IP address.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/5/18 10:38 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Did you restart the neutron-dhcp-agent  and rebooted the VMs?

On Thu, Jul 5, 2018 at 10:30 AM, Torin Woltjer  
wrote:
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/

Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
The qrouter netns appears once the lock_path is specified, the neutron router 
is pingable as well. However, instances are not pingable. If I log in via 
console, the instances have not been given IP addresses, if I manually give 
them an address and route they are pingable and seem to work. So the router is 
working correctly but dhcp is not working.

No errors in any of the neutron or nova logs on controllers or compute nodes.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/5/18 8:53 AM
To: 
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Openstack] Recovering from full outage

2018-07-05 Thread Torin Woltjer
There is no lock path set in my neutron configuration. Does it ultimately 
matter what it is set to as long as it is consistent? Does it need to be set on 
compute nodes as well as controllers?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 7:47 PM
To: torin.wolt...@granddial.com
Cc: openstack-operators@lists.openstack.org, openst...@lists.openstack.org
Subject: Re: [Openstack] Recovering from full outage

Did you set a lock_path in the neutron’s config?

On Jul 3, 2018, at 17:34, Torin Woltjer  wrote:

The following errors appear in the neutron-linuxbridge-agent.log on both 
controllers: http://paste.openstack.org/show/724930/

No such errors are on the compute nodes themselves.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: "Torin Woltjer" 
Sent: 7/3/18 5:14 PM
To: 
Cc: "openstack-operators@lists.openstack.org" 
, "openst...@lists.openstack.org" 

Subject: Re: [Openstack] Recovering from full outage
Running `openstack server reboot` on an instance just causes the instance to be 
stuck in a rebooting status. Most notable of the logs is neutron-server.log 
which shows the following:
http://paste.openstack.org/show/724917/

I realized that rabbitmq was in a failed state, so I bootstrapped it, rebooted 
controllers, and all of the agents show online.
http://paste.openstack.org/show/724921/
And all of the instances can be properly started, however I cannot ping any of 
the instances floating IPs or the neutron router. And when logging into an 
instance with the console, there is no IP address on any interface.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: George Mihaiescu 
Sent: 7/3/18 11:50 AM
To: torin.wolt...@granddial.com
Subject: Re: [Openstack] Recovering from full outage
Try restarting them using "openstack server reboot" and also check the 
nova-compute.log and neutron agents logs on the compute nodes.

On Tue, Jul 3, 2018 at 11:28 AM, Torin Woltjer  
wrote:
We just suffered a power outage in out data center and I'm having trouble 
recovering the Openstack cluster. All of the nodes are back online, every 
instance shows active but `virsh list --all` on the compute nodes show that all 
of the VMs are actually shut down. Running `ip addr` on any of the nodes shows 
that none of the bridges are present and `ip netns` shows that all of the 
network namespaces are missing as well. So despite all of the neutron service 
running, none of the networking appears to be active, which is concerning. How 
do I solve this without recreating all of the networks?

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openst...@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [publiccloud-wg] Meeting this afternoon for Public Cloud WG

2018-07-05 Thread Jean-Daniel Bonnetot
Sorry guys, I'm not available once again.
See you next time.

Jean-Daniel Bonnetot
ovh.com  | @pilgrimstack
 

On 05/07/2018 09:59, "Tobias Rydberg"  wrote:

Hi folks,

Time for a new meeting for the Public Cloud WG. Agenda draft can be 
found at https://etherpad.openstack.org/p/publiccloud-wg, feel free to 
add items to that list.

See you all at IRC 1400 UTC in #openstack-publiccloud

Cheers,
Tobias

-- 
Tobias Rydberg
Senior Developer
Twitter & IRC: tobberydberg

www.citynetwork.eu | www.citycloud.com

INNOVATION THROUGH OPEN IT INFRASTRUCTURE
ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [publiccloud-wg] Meeting this afternoon for Public Cloud WG

2018-07-05 Thread Tobias Rydberg

Hi folks,

Time for a new meeting for the Public Cloud WG. Agenda draft can be 
found at https://etherpad.openstack.org/p/publiccloud-wg, feel free to 
add items to that list.


See you all at IRC 1400 UTC in #openstack-publiccloud

Cheers,
Tobias

--
Tobias Rydberg
Senior Developer
Twitter & IRC: tobberydberg

www.citynetwork.eu | www.citycloud.com

INNOVATION THROUGH OPEN IT INFRASTRUCTURE
ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators