Re: [Openstack] Fatal error during container create (ansible-openstack on bionic)

2018-07-12 Thread Ruth Ivimey-Cook

Hi Mohammed,

Xenial doesn't fit my needs in other ways, so reverting to it isn't an 
option, and I also want to use bionic's spin of openstack rather than 
run with packages from another repo, nor wait for an unspecified period 
for Rocky's appearance. I am prepared to invest some time in debugging 
this build, but I was hoping for some assistance here out of courtesy.


The only reason I looked at os-a was that configuring openstack on its 
own was unfathomably complex. Even with ansible it is still very complex 
for what I feel should be a simple task. I don't really understand why 
there isn't a default usable (i.e. one-box cloud) configuration in the 
.rpm/.deb packages which users can then extend and adapt, as with other 
similar tools.


Where would I look for the current work on the os-a rocky update? 
perhaps I can derive something useful from that.


Regards,
Ruth


On 11/07/2018 20:35, Mohammed Naser wrote:

Hi there,

Bionic isn't current supported and we're working on adding support for
it in the Rocky cycle!  We recommend you deploy on Xenial.

Thanks,
Mohammed

On Wed, Jul 11, 2018 at 11:44 AM, Ruth Ivimey-Cook  wrote:

I am getting a fatal error in lxc_create when running openstack-ansible
playbooks/setup-hosts.yml and hoping someone can push me in the right
direction. Logs below...

I am interpreting the fatal error as some sort of missing config, which is
why I included the warnings that happened earlier in the above. Is that
right? Is there any way I can isolate where exactly in the ansible setup
this happens?

The only significant changes I've made to the ansible setup are

- comment out `linux-image-extra-{{ ansible_kernel }}` package from the
ubuntu config as it no longer exists.
- create /etc/ansible/.../*ubuntu-18.04.yml files by copying the equivalent
ubuntu-16.04.yml file, where no 18.04 version was already present.


~/openstack-ansible$ sudo openstack-ansible playbooks/setup-hosts.yml

Variable files: "-e @/etc/openstack_deploy/user_secrets.yml -e
@/etc/openstack_deploy/user_variables.yml "

  [WARNING]: Unable to parse /etc/openstack_deploy/inventory.ini as an
inventory source
[DEPRECATION WARNING]: 'include' for playbook includes. You should use
'import_playbook' instead. This

feature will be removed in version 2.8. Deprecation warnings can be
disabled by setting

deprecation_warnings=False in ansible.cfg.

  [WARNING]: Could not match supplied host pattern, ignoring:
all_lxc_containers

  [WARNING]: Could not match supplied host pattern, ignoring:
all_nspawn_containers

PLAY [Install Ansible prerequisites]
*

TASK [Ensure python is installed]


ok: [aio1]


... lots of stuff that works...


TASK [Create the new LXC service log directory]
**

ok: [aio1]

TASK [Create the LXC service log aggregation link]
***

ok: [aio1]

TASK [apt_package_pinning : Add apt pin preferences]
*

TASK [lxc_hosts : Check for the presence of a public key file on the
deployment host] 

ok: [aio1 -> localhost]

TASK [lxc_hosts : Fail if a ssh public key is not set in a var and is not
present on the deployment host] 

TASK [lxc_hosts : Gather variables for each operating system]


ok: [aio1] =>
(item=/etc/ansible/roles/lxc_hosts/vars/ubuntu-18.04-host.yml)

TASK [lxc_hosts : Gather container variables]


  [WARNING]: Invalid request to find a file that matches a "null" value

ok: [aio1] => (item=/etc/ansible/roles/lxc_hosts/vars/ubuntu-18.04.yml)

TASK [lxc_hosts : include_tasks]
*

included: /etc/ansible/roles/lxc_hosts/tasks/lxc_pre_install.yml for aio1

A little later in the same run:


TASK [lxc_container_create : Check the physical_host variable is set]


TASK [lxc_container_create : Collect physical host facts if missing]
*

TASK [lxc_container_create : Kernel version and LXC backing store check]
*

TASK [lxc_container_create : Gather variables for each operating system]
*

  [WARNING]: Invalid request to find a file that matches a "null" value

  [WARNING]: Invalid request to find a file that matches a "null" value

ok: [aio1_cinder_api_container-3255dd97] =>
(item=/etc/ansible/roles/lxc_container_create/vars/ubuntu-18.04.yml)

  [WARNING]: Invalid request to find a file that matches a "null" value

ok: [aio1_designate_container-54f1c305] =>
(item=/etc/ansible/roles/lxc_container_create/vars/ubuntu-18.04.yml)

  [WARNI

Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-12 Thread Torin Woltjer
Checking iptables for the metadata-proxy inside of qrouter provides the 
following:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e iptables-save -c | 
grep 169
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j REDIRECT --to-ports 9697
[0:0] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p tcp -m 
tcp --dport 80 -j MARK --set-xmark 0x1/0x
Packets:Bytes are both 0, so no traffic is touching this rule?

Interestingly the command:
$ip netns exec qrouter-80c3bc40-b49c-446a-926f-99811adc0c5e netstat -anep | 
grep 9697
returns nothing, so there isn't actually anything running on 9697 in the 
network namespace...

This is the output without grep:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
User   Inode  PID/Program name
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76154  8404/keepalived
raw0  0 0.0.0.0:112 0.0.0.0:*   7   
0  76153  8404/keepalived
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags   Type   State I-Node   PID/Program name 
Path
unix  2  [ ] DGRAM645017567/python2
unix  2  [ ] DGRAM799538403/keepalived

Could the reason no traffic touching the rule be that nothing is listening on 
that port, or is there a second issue down the chain?

Curl fails even after restarting the neutron-dhcp-agent & neutron-metadata 
agent.

Thank you for this, and any future help.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-12 Thread Brian Haley

On 07/12/2018 08:20 AM, Torin Woltjer wrote:
The neutron-metadata-agent service is running, the the agent is alive, 
and it is listening on port 8775. However, new instances still do not 
get any information like hostname or keypair. If I run `curl 
192.168.116.22:8775` from the compute nodes, I do get a response. The 
metadata agent is running, listening, and accessible from the compute 
nodes; and it worked previously.


I'm stumped.


There is also a metadata proxy that runs in the qrouter namespace, you 
can verify it's running and getting requests by looking at both iptables 
and netstat output.


$ sudo ip netns exec qrouter-$ID iptables-save -c | grep 169
[16:960] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ -p 
tcp -m tcp --dport 80 -j REDIRECT --to-ports 9697
[96:7968] -A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -i qr-+ 
-p tcp -m tcp --dport 80 -j MARK --set-xmark 0x1/0x


The numbers inside [] represent packets:bytes, so non-zero is good.

$ sudo ip netns exec qrouter-$ID netstat -anep | grep 9697
tcp0  0 0.0.0.0:96970.0.0.0:* 
LISTEN  0  294339  4867/haproxy


If you have a running instance you can log into, running curl to the 
metadata IP would be helpful to try and diagnose since it would go 
through this entire path.


-Brian



/*Torin Woltjer*/
*Grand Dial Communications - A ZK Tech Inc. Company*
*616.776.1066 ext. 2006*
/*www.granddial.com */


*From*: அருண் குமார் (Arun Kumar) 
*Sent*: 7/12/18 12:01 AM
*To*: torin.wolt...@granddial.com
*Cc*: "openstack@lists.openstack.org" , 
openstack-operat...@lists.openstack.org

*Subject*: Re: [Openstack-operators] [Openstack] Recovering from full outage
Hi Torin,

If I run `ip netns exec qrouter netstat -lnp` or `ip netns exec
qdhcp netstat -lnp` on the controller, should I see anything
listening on the metadata port (8775)? When I run these commands I
don't see that listening, but I have no example of a working system
to check against. Can anybody verify this?


Either on qrouter/qdhcp namespaces, you won't see port 8775, instead 
check whether meta-data service is running on the neutron controller 
node(s) and listening on port 8775? Aslo, you can verify metadata and 
neturon services using following commands


service neutron-metadata-agent status
neutron agent-list
netstat -ntplua | grep :8775


Thanks & Regards
Arun

ஃஃ
அன்புடன்
அருண்
நுட்பம் நம்மொழியில் தழைக்கச் செய்வோம்
http://thangamaniarun.wordpress.com
ஃஃ



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-12 Thread John Petrini
You might want to try giving the neutron-dhcp and metadata agents a restart.
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-12 Thread Torin Woltjer
I tested this on two instances. The first instance has existed since before I 
began having this issue. The second is created from a cirros test image.

On the first instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev ens3 proto dhcp metric 100.
curl returns information, for example;
`curl http://169.254.169.254/latest/meta-data/public-keys`
0=nextcloud

On the second instance:
The route exists: 169.254.169.254 via 172.16.1.1 dev eth0
curl fails;
`curl http://169.254.169.254/latest/meta-data`
curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out

I am curious why this is the case that one is able to connect but not the 
other. Both the first and second instances were running on the same compute 
node.

Torin Woltjer

Grand Dial Communications - A ZK Tech Inc. Company

616.776.1066 ext. 2006
www.granddial.com


From: John Petrini 
Sent: 7/12/18 9:16 AM
To: torin.wolt...@granddial.com
Cc: thangam.ar...@gmail.com, OpenStack Operators 
, OpenStack Mailing List 

Subject: Re: [Openstack-operators] [Openstack] Recovering from full outage
Are you instances receiving a route to the metadata service (169.254.169.254) 
from DHCP? Can you curl the endpoint? curl 
http://169.254.169.254/latest/meta-data


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


Re: [Openstack] [Openstack-operators] Recovering from full outage

2018-07-12 Thread John Petrini
Are you instances receiving a route to the metadata service
(169.254.169.254) from DHCP? Can you curl the endpoint? curl
http://169.254.169.254/latest/meta-data
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack