All,

As a follow up to this, we eventually discovered that the issues were caused by a recent migration to a new public cloud provider (they host our stateful data - user accounts, etc.). Our old cloud provider allowed for publicly-routable IPs to be directly attached to instances, and our new one does not (they utilize 1:1 NAT). The 1:1 NAT was causing issued with our LDAP provider (ipa) and would occasionally cause the keystone-wsgi service to hang (but only when leveraging the LDAP keystone domain - accounts in the default sql domain worked fine). Pointing keystone at a fully-routable address over a VPN instead of at the 1:1 NAT address over the internet caused the issues to disappear. This may have been more of a FreeIPA not liking 1:1 NAT more than anything else, but just thought I'd share in case anyone ran into something similar in the future.

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

On 2018-02-20 10:27 PM, Chris Apsey wrote:
All,

Currently experiencing a sporadic issue with our keystone endpoints.
Throughout the day, keystone will just stop responding on both the
admin and public endpoints, which will cause all services to hang.
Restarting apache2 fixes the issue for some amount of time, but it
inevitably appears again later on.  Here is what I am seeing:

keystone: /var/log/apache2/keystone.log

    2018-02-20 21:50:38.830302 Timeout when reading response headers
from daemon process 'keystone-admin': /usr/bin/keystone-wsgi-admin
    2018-02-20 21:50:50.799587 Timeout when reading response headers
from daemon process 'keystone-admin': /usr/bin/keystone-wsgi-admin
    2018-02-20 21:51:02.857266 Timeout when reading response headers
from daemon process 'keystone-admin': /usr/bin/keystone-wsgi-admin
    2018-02-20 21:51:02.879630 mod_wsgi (pid=1221): Exception occurred
processing WSGI script '/usr/bin/keystone-wsgi-admin'.
    2018-02-20 21:51:02.879796 IOError: failed to write data
    2018-02-20 21:51:07.005702 mod_wsgi (pid=1220): Exception occurred
processing WSGI script '/usr/bin/keystone-wsgi-admin'.

horizon: /var/log/apache2/error.log

    [Tue Feb 20 21:47:02.582511 2018] [wsgi:error] [pid 1227:tid
140591048296192] [client 10.10.5.200:57462] Timeout when reading
response headers from daemon process 'horizon':
/usr/share/openstack-dashboard/openstack_dashboard/wsgi/django.wsgi,
referer:
https://vta.cybbh.space/horizon/project/instances/900e9d57-752d-488c-8dba-ffc098e1a51a/
    [Tue Feb 20 21:48:03.962589 2018] [wsgi:error] [pid 1225:tid
140591249823488] ERROR openstack_auth.user Unable to retrieve project
list.
    [Tue Feb 20 21:48:03.962646 2018] [wsgi:error] [pid 1225:tid
140591249823488] Traceback (most recent call last):
    [Tue Feb 20 21:48:03.962656 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/openstack_auth/user.py", line 350,
in authorized_tenants
    [Tue Feb 20 21:48:03.962665 2018] [wsgi:error] [pid 1225:tid
140591249823488]     is_federated=self.is_federated)
    [Tue Feb 20 21:48:03.962673 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/openstack_auth/utils.py", line 372,
in get_project_list
    [Tue Feb 20 21:48:03.962734 2018] [wsgi:error] [pid 1225:tid
140591249823488]     projects =
client.projects.list(user=kwargs.get('user_id'))
    [Tue Feb 20 21:48:03.962744 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/positional/__init__.py", line 101,
in inner
    [Tue Feb 20 21:48:03.962752 2018] [wsgi:error] [pid 1225:tid
140591249823488]     return wrapped(*args, **kwargs)
    [Tue Feb 20 21:48:03.962759 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneclient/v3/projects.py", line
119, in list
    [Tue Feb 20 21:48:03.962767 2018] [wsgi:error] [pid 1225:tid
140591249823488]     **kwargs)
    [Tue Feb 20 21:48:03.962774 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneclient/base.py", line 75, in
func
    [Tue Feb 20 21:48:03.962782 2018] [wsgi:error] [pid 1225:tid
140591249823488]     return f(*args, **new_kwargs)
    [Tue Feb 20 21:48:03.962789 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneclient/base.py", line 390,
in list
    [Tue Feb 20 21:48:03.962796 2018] [wsgi:error] [pid 1225:tid
140591249823488]     self.collection_key)
    [Tue Feb 20 21:48:03.962803 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneclient/base.py", line 125,
in _list
    [Tue Feb 20 21:48:03.962811 2018] [wsgi:error] [pid 1225:tid
140591249823488]     resp, body = self.client.get(url, **kwargs)
    [Tue Feb 20 21:48:03.962818 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 288,
in get
    [Tue Feb 20 21:48:03.962826 2018] [wsgi:error] [pid 1225:tid
140591249823488]     return self.request(url, 'GET', **kwargs)
    [Tue Feb 20 21:48:03.962833 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 447,
in request
    [Tue Feb 20 21:48:03.962841 2018] [wsgi:error] [pid 1225:tid
140591249823488]     resp = super(LegacyJsonAdapter,
self).request(*args, **kwargs)
    [Tue Feb 20 21:48:03.962848 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneauth1/adapter.py", line 192,
in request
    [Tue Feb 20 21:48:03.962855 2018] [wsgi:error] [pid 1225:tid
140591249823488]     return self.session.request(url, method,
**kwargs)
    [Tue Feb 20 21:48:03.962863 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/positional/__init__.py", line 101,
in inner
    [Tue Feb 20 21:48:03.962870 2018] [wsgi:error] [pid 1225:tid
140591249823488]     return wrapped(*args, **kwargs)
    [Tue Feb 20 21:48:03.962877 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneauth1/session.py", line 703,
in request
    [Tue Feb 20 21:48:03.962885 2018] [wsgi:error] [pid 1225:tid
140591249823488]     resp = send(**kwargs)
    [Tue Feb 20 21:48:03.962892 2018] [wsgi:error] [pid 1225:tid
140591249823488]   File
"/usr/lib/python2.7/dist-packages/keystoneauth1/session.py", line 777,
in _send_request
    [Tue Feb 20 21:48:03.962899 2018] [wsgi:error] [pid 1225:tid
140591249823488]     raise exceptions.ConnectFailure(msg)
    [Tue Feb 20 21:48:03.962907 2018] [wsgi:error] [pid 1225:tid
140591249823488] ConnectFailure: Unable to establish connection to
https://*******:5000/v3/users/7e68b998ee1ec26139d3482818c9643d1ce3b5aff532c865cff65e1c9fe01306/projects?:
('Connection aborted.', BadStatusLine("''",))


I get the same behavior regardless of service and regardless of
whether or not I use the CLI or Horizon.  All signs point to keystone
being the culprit.

I have adjusted my /etc/apache2/sites-available/keystone.conf:

WSGIDaemonProcess keystone-public processes=8 threads=4 user=keystone
group=keystone display-name=%{GROUP}
WSGIDaemonProcess keystone-admin processes=8 threads=4 user=keystone
group=keystone display-name=%{GROUP}

And ensured that WSGIApplicationGroup %{GLOBAL} is present.

haproxy is sitting in between keystone and all other services, and is
configured as follows:

defaults
  log  global
  maxconn  16384
  option  redispatch
  retries  3
  timeout  http-request 30s
  timeout  queue 1m
  timeout  connect 30s
  timeout  client 2m
  timeout  server 2m
  timeout  check 10s

...

listen keystone_admin_cluster
  bind 10.10.5.200:35357
  balance  source
  option  tcpka
  option  httpchk
  option  tcplog
  server keystone-0 10.10.5.120:35357 check inter 2000 rise 2 fall 5
  server keystone-1 10.10.5.121:35357 check inter 2000 rise 2 fall 5

listen keystone_public_internal_cluster
  bind 10.50.10.0:5000 ssl crt /etc/letsencrypt/live/*****/master.pem
  bind 10.10.5.200:5000
  balance  roundrobin
  option  tcpka
  option  httpchk
  option  tcplog
  server keystone-0 10.10.5.120:5000 check inter 2000 rise 2 fall 5
  server keystone-1 10.10.5.121:5000 check inter 2000 rise 2 fall 5

...


Any ideas on where else I should look?

Thanks in advance,

---
v/r

Chris Apsey
bitskr...@bitskrieg.net
https://www.bitskrieg.net

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to