Re: [OpenStack-Infra] proposal: custom favicon for review.o.o

2020-01-29 Thread Ian Wienand
On Wed, Jan 29, 2020 at 06:35:28AM +, Sorin Sbarnea wrote:
> I guess that means that you are not against the idea.

I know it's probably not what you want to hear, but as it seems
favicons are becoming a component of branding like a logo I think
you'd do well to run your proposed work by someone with the expertise
to evaluate it with-respect-to whatever branding standards we have (I
imagine someone on the TC would have such contacts from the Foundation
or whoever does marketing).

If you just make something up and send it, you're probably going to
get review questions like "how can we know this meets the branding
standards to be the logo on our most popular website" or "is this the
right size, format etc. for browsers in 2020" which are things
upstream marketing and web people could sign off on.  So, personally,
I'd suggest a bit of pre-coordination there would mean any resulting
technical changes would be very non-controversial.

-i


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] tarballs.openstack.org to AFS publishing gameplan

2020-01-29 Thread Ian Wienand
On Wed, Jan 29, 2020 at 05:21:49AM +, Jeremy Stanley wrote:
> Of course I meant from /(.*) to tarballs.opendev.org/openstack/$1 so
> that clients actually get directed to the correct files. ;)

Ahh yes, sorry you mentioned that in IRC and I should have
incorporated that.  I'm happy with that; we can also have that
in-place and test it by overriding our hosts files before any
cut-over.

-i


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] proposal: custom favicon for review.o.o

2020-01-28 Thread Ian Wienand
On Fri, Jan 24, 2020 at 09:32:00AM +, Sorin Sbarnea wrote:
> We are currently using default Gerrit favicon on
> https://review.opendev.org and I would like to propose changing it
> in order to ease differentiation between it and other gerrit servers
> we may work with.

I did notice google started putting this next to search results
recently too, but then maybe reverted the change.

> How hard it would be to override it? (where)

I'm 99% sure it's built-in from [2] and there's no way to runtime
override it.  It looks like for robots.txt we tell the apache that
fronts gerrit to look elsewhere [3]; I imagine the same would need to
be done for favicon.ico.

... also be aware that upcoming containerisation of gerrit probably
invalidates all that.

-i

[1] 
https://www.theverge.com/2020/1/24/21080424/google-search-result-ads-desktop-favicon-redesign-backtrack-controversial-experiment
[2] 
https://opendev.org/opendev/gerrit/src/branch/openstack/2.13.12/gerrit-war/src/main/webapp
[3] 
https://opendev.org/opendev/puppet-gerrit/src/branch/master/templates/gerrit.vhost.erb#L71


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] tarballs.openstack.org to AFS publishing gameplan

2020-01-28 Thread Ian Wienand
Hello,

We're at the point of implementing the tarballs.openstack.org
publishing changes from [1], and I would just like to propose some
low-level plans for feedback that exceeded the detail in the spec.

We currently have tarballs.opendev.org which publishes content from
/afs/openstack.org/project/opendev.org/tarballs.  This is hosted on
the physical server files02.openstack.org and managed by puppet [2].

 1) I propose we move tarballs.opendev.org to be served by
static01.opendev.org and configured via ansible

Because:

 * it's one less thing running on a Xenial host with puppet we don't
   want to maintain.
 * it will be alongside tarballs.openstack.org per below

The /afs/openstack.org/project/opendev.org directory is currently a
single AFS volume "projet.opendev" and contains subdirectories:

 docs tarballs www

opendev.org jobs currently write their tarball content into the AFS
location, which is periodically "vos released" by [3].

 2) I propose we make a separate volume, with separate quota, and
mount it at /afs/openstack.org/project/tarballs.opendev.org.  We
copy the current data to that location and modify the opendev.org
tarball publishing jobs to use that location, and setup the same
periodic release.

Because:

 * Although currently the volume is tiny (<100mb), it will become
   quite large when combined with ~140gb of openstack repos
 * this seems distincly separate from docs and www data
 * we have content for other hosts at /afs/openstack.org/project like
   this, it fits logically.

The next steps are described in the spec; with this in place, we copy
the current openstack tarballs from
static.openstack.org:/srv/static/tarballs to
/afs/openstack.org/project/tarballs.opendev.org/openstack/

We then update the openstack tarball publishing jobs to publish to
this new location via AFS (we should be able to make this happen in
parallel, initially).

Finally, we need to serve these files.

 3) I propose we make tarballs.openstack.org a vhost on
static.opendev.org that serves the
/afs/openstack.org/project/tarballs.opendev.org/openstack/
directory.

Because

 * This is transparent for tarballs.openstack.org; all URLs work with
   no redirection, etc.
 * anyone hitting tarballs.opendev.org will see top-level project
   directories (openstack, zuul, airship, etc.) which makes sense.

I think this will get us where we want to be.

Any feedback welcome, thanks.  We will keep track of things in [4].

[1] https://docs.opendev.org/opendev/infra-specs/latest/specs/retire-static.html
[2] 
https://opendev.org/opendev/system-config/src/branch/master/manifests/site.pp#L441
[3] 
https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/files/openafs/release-volumes.py
[4] https://storyboard.openstack.org/#!/story/2006598


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Creating OpenDev control-plane docker images and naming

2019-12-02 Thread Ian Wienand
On Tue, Nov 26, 2019 at 05:31:07PM +1100, Ian Wienand wrote:
> What I would propose is that projects do *not* have a single,
> top-level Dockerfile, but only (potentially many) specifically
> name-spaced versions.

> [2] I started looking at installing these together from a Dockerfile
> in system-config.  The problem is that you have a "build context",
> basically the directory the Dockerfile is in and everything under
> it.

I started looking closely at this, and I think have reversed my
position from above.  That is, I think we should keep the OpenDev
related dockerfiles in system-config.

[1] is a change in system-config to add jobs to build openstacksdk,
diskimage-builder and nodepool-[builder|launcher] containers.  It does
this by having these projects as required-projects: in the job
configuration and copying the Dockerfile into the Zuul-checked-out
source (and using that as the build context).  A bit ugly, but it
works.

However, to use these jobs for nodepool CI requires importing them
into zuul/nodepool.  This is tested with [2].

However, Zuul just reports:

  This change depends on a change with an invalid configuration.

It only depends-on [1], which has a valid configuration, at least in
the opendev tenant.

I think that this has to do with the zuul tenant not having the
projects that are used by required-jobs: from the new system-config
jobs [3], but am not certain it doesn't have something else to do with
the config errors at [4].  I have filed [5] because at the minimum a
more helpful error would be good.

-i

[1] https://review.opendev.org/696000
[2] https://review.opendev.org/696486
[3] https://review.opendev.org/696859
[4] https://zuul.opendev.org/t/zuul/config-errors
[5] https://storyboard.openstack.org/#!/story/2006968


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Creating OpenDev control-plane docker images and naming

2019-11-25 Thread Ian Wienand
Hello,

I'm trying to get us to a point where we can use nodepool container
images in production, particularly because I want to use updated tools
available in later distributions than our current Xenial builders [1]

We have hit the hardest problem; naming :)

To build a speculative nodepool-builder container image that is
suitable for a CI job (the prerequisite for production), we need to
somehow layer openstacksdk, diskimage-builder and finally nodepool
itself into one image for testing. [2]

These all live in different namespaces, and the links between them are
not always clear.  Maybe a builder doesn't need diskimage-builder if
images come from elsewhere.  Maybe a launcher doesn't need
openstacksdk if it's talking to some other cloud.

This becomes weird when the zuul/nodepool-builder image depends on
opendev/python-base but also openstack/diskimage-builder and
openstack/openstacksdk.  You've got 3 different namespaces crossing
with no clear indication of what is supposed to work together.

I feel like we've been (or at least I have been) thinking that each
project will have *a* Dockerfile that produces some canonical
 image.  I think I've come to the conclusion this
is infeasible.

There can't be a single container that suits everyone, and indeed this
isn't the Zen of containers anyway.

What I would propose is that projects do *not* have a single,
top-level Dockerfile, but only (potentially many) specifically
name-spaced versions.

So for example, everything in the opendev/ namespace will be expected
to build from opendev/python-base.  Even though dib, openstacksdk and
zuul come from difference source-repo namespaces, it will make sense
to have:

  opendev/python-base
  +-> opendev/openstacksdk
  +-> opendev/diskimage-builder
  +-> opendev/nodepool-builder

because these containers are expected to work together as the opendev
control plane containers.  Since opendev/nodepool-builder is defined
as an image that expected to make RAX compatible, OpenStack uploadable
images it makes logical sense for it to bundle the kitchen sink.

I would expect that nodepool would also have a Docker.zuul file to
create images in the zuul/ namespace as the "reference"
implementation.  Maybe that looks a lot like Dockerfile.opendev -- but
then again maybe it makes different choices and does stuff like
Windows support etc. that the opendev ecosystem will not be interested
in.  You can still build and test these images just the same; just
we'll know they're targeted at doing something different.

As an example:

  https://review.opendev.org/696015 - create opendev/openstacksdk image
  https://review.opendev.org/693971 - create opendev/diskimage-builder

(a nodepool change will follow, but it's a bit harder as it's
cross-tenant so projects need to be imported).

Perhaps codifying that there's no such thing as *a* Dockerfile, and
possibly rules about what happens in the opendev/ namespace is spec
worthy, I'm not sure.

I hope this makes some sense!

Otherwise, I'd be interested in any and all ideas of how we basically
convert the nodepool-functional-openstack-base job to containers (that
means, bring up a devstack, and test nodepool, dib & openstacksdk with
full Depends-On: support to make sure it can build, upload and boot).
I consider that a pre-requisite before we start rolling anything out
in production.

-i

[1] I know we have ideas to work around the limitations of using host
tools to build images, but one thing at a time! :)

[2] I started looking at installing these together from a Dockerfile
in system-config.  The problem is that you have a "build context",
basically the directory the Dockerfile is in and everything under
it.  You can't reference anything outside this.  This does not
play well with Zuul, which has checked out the code for dib,
openstacksdk & nodepool into three different sibling directories.
So to speculatively build them together, you have to start copying
Zuul checkouts of code underneath your system-config Dockerfile
which is crazy.  It doesn't use any of the speculative build
registry stuff and just feels wrong because you're not building
small parts ontop of each other, as Docker is designed to do.  I
still don't really know how it will work across all the projects
for testing either.
  https://review.opendev.org/696000


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] [zuul-jobs] configure-mirrors: deprecate mirroring configuration for easy_install

2019-11-24 Thread Ian Wienand
Hello,

Today I force-merged [5] to avoid widespread gate breakage.  Because
the change is in zuul-jobs, we have a policy of annoucing
deprecations.  I've written the following but not sent it to
zuul-announce (per policy) yet, as I'm not 100% confident in the
explanation.

I'd appreciate it if, once proof-read, someone could send it out
(modified or otherwise).

Thanks,

-i

--

Hello,

The recent release of setuptools 42.0.0 has broken the method used by
the configure-mirrors role to ensure easy_install (the older method of
install packages, before pip became in widespread use [1]) would only
access the PyPi mirror.

The prior mirror setup code would set the "allow_hosts" whitelist to
the mirror host exclusively in pydistutils.cfg.  This would avoid
easy_install "leaking" access outside the specified mirror.

Change [2] in setuptools means that pip is now used to fetch packages.
Since it does not implement the constraints of the "allow_hosts"
setting, specifying this option has become an error condition.  This
is reported as:

 the `allow-hosts` option is not supported 'when using pip to install 
requirements

It has been pointed out [3] that this prior code would break any
dependency_links [4] that might be specified for the package (as the
external URLs will not match the whitelist).  Overall, there is no
desire to work-around this behaviour as easy_install is considered
deprecated for any current use.

In short, this means the only solution is to remove the now
conflicting configuration from pydistutils.cfg.  Due to the urgency of
this update, it has been merged with [5] before our usual 2-week
deprecation notice.

The result of this is that older setuptools (perhaps in a virtualenv)
with jobs still using easy_install may not correctly access the
specified mirror.  Assuming jobs have access to PyPi they would still
work, although without the benefits of a local mirror.  If such jobs
are firewalled from usptream they may now fail.  We consider the
chance of jobs using this legacy install method in this situation to
be very low.

Please contact zuul-discuss [6] with any concerns.

We now return you to your regularly scheduled programming :)

[1] https://packaging.python.org/discussions/pip-vs-easy-install/
[2] 
https://github.com/pypa/setuptools/commit/d6948c636f5e657ac56911b71b7a459d326d8389
[3] https://github.com/pypa/setuptools/issues/1916
[4] https://python-packaging.readthedocs.io/en/latest/dependencies.html
[5] https://review.opendev.org/695821
[6] http://lists.zuul-ci.org/cgi-bin/mailman/listinfo/zuul-discuss


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] CentOS 8 as a Python 3-only base image

2019-09-30 Thread Ian Wienand
On Fri, Sep 27, 2019 at 11:09:22AM +, Jeremy Stanley wrote:
> I'd eventually love to see us stop preinstalling pip and virtualenv
> entirely, allowing jobs to take care of doing that at runtime if
> they need to use them.

You'd think, right?  :) But it is a bit of a can of worms ...

So pip is a part of Python 3 ... "dnf install python3" brings in
python3-pip unconditionally.  So there will always be a pip on the
host.

For CentOS 8 that's pip version 9.something (upstream is on
19.something).  This is where traditionally we've had problems;
requirements, etc. uses some syntax feature that tickles a bug in old
pip and we're back to trying to override the default version.  I think
we can agree to try and mitigate that in jobs, rather than in base
images.

But as an additional complication, CentOS 8 ships it's
"platform-python" which is used by internal tools like dnf.  The thing
is, we have Python tools that could probably reasonably be considered
platform tools like "glean" which instantiates our networking.  I am
not sure if "/usr/libexec/platform-python -m pip install glean" is
considered an abuse or a good way to install against a stable Python
version.  I'll go with the latter ...

But ... platform-python doesn't have virtualenv (separate package on
Python 3).  Python documentation says that "venv" is a good way to
create a virtual environment and basically suggests it can do things
better than virtualenv because it's part of the base Python and so
doesn't have to have a bunch of hacks.  Then the virtualenv
documentation throws some shade at venv saying "a subset of
[virtualenv] has been integrated into the standard library" and lists
why virtualenv is better.  Now we have *three* choices for a virtual
environment: venv with either platform python or packaged python, or
virtualenv with packaged python.  Which should an element choose, if
they want to setup tools on the host during image build?  And how do
we stop every element having to hard-code all this logic into itself
over and over?

Where I came down on this is :

https://review.opendev.org/684462 : this stops installing from source
on CentOS 8, which I think we all agree on.  It makes some opinionated
decisions in creating DIB_PYTHON_PIP and DIB_PYTHON_VIRTUALENV
variables that will "do the right thing" when used by elements:

 * Python 2 first era (trusty/centos7) will use python2 pip and virtualenv
 * Python 3 era (bionic/fedora) will use python3 pip and venv (*not*
   virtualenv)
 * RHEL8/CentOS 8 will use platform-python pip & venv

https://review.opendev.org/685643 : above in action; installing glean
correctly on all supported distros.

-i


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] CentOS 8 as a Python 3-only base image

2019-09-27 Thread Ian Wienand
Hello,

All our current images use dib's "pip-and-virtualenv" element to
ensure the latest pip/setuptools/virtualenv are installed, and
/usr/bin/ installs Python 2 packages and
/usr/bin/ install Python 3 packages.

The upshot of this is that all our base images have Python 2 and 3
installed (even "python 3 first" distros like Bionic).

We have to make a decision if we want to continue this with CentOS 8;
to be specific the change [1].

Installing pip and virtualenv from upstream sources has a long history
full of bugs and workarounds nobody wants to think about (if you do
want to think about it, you can start at [2]).

A major problem has been that we have to put these packages on "hold",
to avoid the situation where the packaged versions are re-installed
over the upstream versions, creating a really big mess of mixed up
versions.

I'm thinking that CentOS 8 is a good place to stop this.  We just
won't support, in dib, installing pip/virtualenv from source for
CentOS 8.  We hope for the best that the packaged versions of tools
are always working, but *if* we do require fixes to the system
packages, we will implement that inside jobs directly, rather than on
the base images.

I think the 2019 world this is increasingly less likley, as we have
less reliance on older practices like mixing system-wide installs
(umm, yes devstack ... but we have a lot of work getting centos8
stable there anyway) and the Zuul v3 world makes it much easier to
deploy isloated fixes as roles should we need.

If we take this path, the images will be Python 3 only -- we recently
turned Ansible's "ansible_python_interpreter" to Python 3 for Fedora
30 and after a little debugging I think that is ready to go.  Of
course jobs can install the Python 2 environment should they desire.

Any comments here, or in the review [1] welcome.

Thanks,

-i

[1] https://review.opendev.org/684462
[2] 
https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-install-pip#L73


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Weekly Infra Team Meeting Agenda for August 13, 2019

2019-08-12 Thread Ian Wienand
We will be meeting tomorrow at 19:00 UTC in #openstack-meeting on freenode with 
this agenda:

== Agenda for next meeting ==

* Announcements

* Actions from last meeting

* Specs approval

* Priority Efforts (Standing meeting agenda items. Please expand if you have 
subtopics.)
** 
[http://specs.openstack.org/openstack-infra/infra-specs/specs/task-tracker.html 
A Task Tracker for OpenStack]
** 
[http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-management.html
 Update Config Management]
*** topic:update-cfg-mgmt
*** Zuul as CD engine
** OpenDev

* General topics
** Trusty Upgrade Progress (clarkb 201900813)
*** Next steps for hosting job logs in swift
** AFS mirroring status (ianw 20190813)
*** Debian buster updates not populated by reprepro but are assumed to be 
present by our mirror setup roles.
** PTG Planning (clarkb 20190813)
*** https://etherpad.openstack.org/p/OpenDev-Shanghai-PTG-2019
** New backup server (ianw 20190813)
*** https://review.opendev.org/#/c/675537

* Open discussion

Thanks,

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] opendev.org downtime Thu Jul 25 07:00 UTC 2019

2019-07-25 Thread Ian Wienand
Hello,

We received reports of connectivity issues to opendev.org at about
06:30 [1].

After some initial investigation, I could not contact
gitea-lb01.opendev.org via ipv4 or 6.

Upon checking it's console I saw a range of kernel errors that suggest
the host was probably having issues with it's disk [2].

I attempted to hard-reboot it, and it went into an error state.  The
initial error in the server status was

 {'message': 'Timed out during operation: cannot acquire state change lock 
(held by monitor=remoteDispatchDomainCreateWithFlags)', 'code': 500, 'created': 
'2019-07-25T07:25:25Z'}

After a short period, I tried again and got a different error state

 {'message': "internal error: process exited while connecting to monitor: 
lc=,keyid=masterKey0,iv=jHURYcYDkXqGBu4pC24bew==,format=base64 -drive 
'file=rbd:volumes/volume-41553c15-6b12-4137-a318-7caf6a9eb44c:id=cinder:auth_supported=cephx\\;none:mon_host=172.24.0.56\\:6789",
 'code': 500, 'created': '2019-07-25T07:27:21Z'}

The vexxhost status page [3] is currently not showing any outages in
the sjc1 region where this resides.

I think this probably requires vexxhost to confirm the status of
load-balancer VM.

I tried to launch a new node, at least to have it ready in case of
bigger issues.  This failed with errors about the image service [4].
This further suggets there might be some storage issues on the
backend.

I then checked on the gitea* backend servers, and they have similar
messages in their kernel logs referring to storage too (I should have
done this first, probably).  So this again suggests it is a
region-wide issue.

I have reached out to mnaser on IRC.  I think he is GMT-4 usually so
that gives a few hours to expect a response.  This will also mean more
experienced gitea admins will be around too.  Given it appears to be a
backend provider issue, I will not take further at this point.

Thanks,

-i

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-07-25.log.html#t2019-07-25T06:36:51
[2] http://paste.openstack.org/show/754834/
[3] https://status.vexxhost.com/
[4] http://paste.openstack.org/show/754835/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Meeting Agenda for July 9, 2019

2019-07-09 Thread Ian Wienand
On Mon, Jul 08, 2019 at 02:36:11PM -0700, Clark Boylan wrote:
> ** Mirror setup updates (clarkb 20190709)
> *** Do we replace existing mirrors with new opendev mirrors running openafs 
> 1.8.3?

I won't make it to the meeting tomorrow sorry, but here's the current
status, which is largely reflected in

 https://etherpad.openstack.org/p/opendev-mirror-afs

The kafs based servers have been paused for now due the hard crashes
in fscache, which requires us to monitor it very closely, which wasn't
happening over holiday breaks.

 https://review.opendev.org/#/c/669231/

dhowells is on vacation till at least the 15th, and there is no real
prospect of those issues being looked at until after then.

There are some changes in the afs-next kernel branch for us to try,
which should help with the "volume offline" issues we saw being
reported when a "vos release" was happening (basically making kafs
switch to the other server better).  I believe that capturing logs
from our AFS servers helped debug these issues.

I can take an action item to build a kernel with them and switch it
back in for testing late next week when I am back (or someone else
can, if they like).  This will give enough for a Tested-By: flag for
sending those changes to Linus upstream.

Once everyone is back, we can look more closely at the fscache issues,
which are currently a blocker for future work.

I'm not aware of any issues with openafs 1.8.3 based mirrors.  If we
need any new mirrors, or feel the need to replace those in production,
we should be fine bringing them up with that.

Thanks,

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] ARA 1.0 deployment plans

2019-06-17 Thread Ian Wienand
On Tue, Jun 11, 2019 at 04:39:58PM -0400, David Moreau Simard wrote:
> Although it was first implemented as somewhat of a hack to address the
> lack of scalability of HTML generation, I've gotten to like the design
> principle of isolating a job's result in a single database.
> 
> It easy to scale and keeps latency to a minimum compared to a central
> database server.

I've been ruminating on how all this can work given some constraints
of

- keep current model of "click on a link in the logs, see the ara
  results"

- no middleware to intercept such clicks with logs on swift

- don't actually know where the logs are if using swift (not just
  logs.openstack.org/xy/123456/) which makes it harder to find job
  artefacts like sqlite db's post job run (have to query gerrit or
  zuul results db?)

- some jobs, like in system-config, have "nested" ARA reports from
  subnodes; essentially reporting twice.

Can the ARA backend import a sqlite run after the fact?  I agree
adding latency to jobs running globally sending results piecemeal back
to a central db isn't going work; but if it logged everything to a
local db as now, then we uploaded that to a central location in post
that might work?  Although we can't run services/middleware on logs
directly, we could store the results as we see fit and run services on
a separate host.

If say, you had a role that sent the generated ARA sqlite.db to
ara.opendev.org and got back a UUID, then it could write into the logs
ara-report/index.html which might just be a straight 301 redirect to
https://ara.opendev.org/UUID.  This satisfies the "just click on it"
part.

It seems that "all" that needs to happen is that requests for
https://ara.opendev.org/uuid/api/v1/... to query either just the
results for "uuid" in the db.

And could the ara-web app (which is presumably then just statically
served from that host) know that when started as
https://ara.opendev.org/uuid it should talk to
https://ara.opendev.org/uuid/api/...?

I think though, this might be relying on a feature of the ara REST
server that doesn't exist -- the idea of unique "runs"?  Is that
something you'd have to paper-over with, say wsgi starting a separate
ara REST process/thread to respond to each incoming
/uuid/api/... request (maybe the process just starts pointing to
/opt/logs/uuid/results.sqlite)?

This doesn't have to grow indefinitely, we can similarly just have a
cron query to delete rows older than X weeks.

Easy in theory, of course ;)

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Meeting Agenda for June 18, 2019

2019-06-17 Thread Ian Wienand
== Agenda for next meeting ==

* Announcements

* Actions from last meeting

* Specs approval

* Priority Efforts (Standing meeting agenda items. Please expand if you have 
subtopics.)
** 
[http://specs.openstack.org/openstack-infra/infra-specs/specs/task-tracker.html 
A Task Tracker for OpenStack]
** 
[http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-management.html
 Update Config Management]
*** topic:update-cfg-mgmt
*** Zuul as CD engine
** OpenDev
*** Next steps

* General topics
** Trusty Upgrade Progress (ianw 20190618)
** https mirror update (ianw 20190618)
*** kafs in production update
*** https://review.opendev.org/#/q/status:open+branch:master+topic:kafs

* Open discussion


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] ARA 1.0 deployment plans

2019-06-11 Thread Ian Wienand
Hello,

I started to look at the system-config base -devel job, which runs
Ansible & ARA from master (this job has been quite useful in flagging
issues early across Ansible, testinfra, ARA etc, but it takes a bit
for us to keep it stable...)

It seems ARA 1.0 has moved in some directions we're not handling right
now.  Playing with [1] I've got ARA generating and uploading it's
database.

Currently, Apache matches an ara-report/ directory on
logs.openstack.org and sent to the ARA wsgi application which serves
the response from the sqlite db in that directory [2].

If I'm understanding, we now need ara-web [3] to display the report
page we all enjoy.  However this web app currently only gets data from
an ARA server instance that provides a REST interface with the info?

I'm not really seeing how this fits with the current middleware
deployment? (unfortunately [4] or an analogue in the new release seems
to have disappeared).  Do we now host a separate ARA server on
logs.openstack.org on some known port that knows how to turn
/*/ara-report/ URL requests into access of the .sqlite db on disk and
thus provide the REST interface?  And then somehow we host a ara-web
instance that knows how to request from this?

Given I can't see us wanting to do a bunch of puppet hacking to get
new services on logs.openstack.org, but yet also it requiring fairly
non-trivial effort to get the extant bits and pieces on that server
migrated to an all-Ansible environment, I think we have to give some
thought as to how we'll roll this out (plus add in containers,
possible logs on swift, etc ... for extra complexity :)

So does anyone have thoughts on a high-level view of how this might
hang together?

-i

[1] https://review.opendev.org/#/c/664478/
[2] 
https://opendev.org/opendev/puppet-openstackci/src/branch/master/templates/logs.vhost.erb
[3] https://github.com/ansible-community/ara-web
[4] https://ara.readthedocs.io/en/stable/advanced.html

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zanata broken on Bionic

2019-04-08 Thread Ian Wienand
On Tue, Apr 02, 2019 at 12:28:31PM +0200, Frank Kloeker wrote:
> The OpenStack I18n team was aware about the fact, that we will run into an
> unsupported platform in the near future and started an investigation about
> the renew of translation platform on [1].
> [1] 
> https://blueprints.launchpad.net/openstack-i18n/+spec/renew-translation-platform

I took an action item to do some investigation in the infra meeting.
From the notes above, for last iteration it looks like it came down to
Zanata v Poodle.  However when I look at [1] Poodle doesn't look
terribly active.

It looks like Fedora haven't made any choices around which way to
go, but weblate has been suggested [2].

Looking at weblate, it seems to have a few things going for it from
the infra point of view

* it seems active
* it's a python/django app which fits our deployments and general
  skills better than java
* has a docker project [3] so has interest in containerisation
* we currently put translations in, and propose them via jobs
  triggered periodically using the zanata CLI tool as described at
  [4].  weblate has a command-line client that looks to me like it can
  do roughly what we do now [5] ... essentially integrate with jobs to
  upload new translations into the tool, and extract the translations
  and put them into gerrit.
* That said, it also seems we could integrate with it more "directly"
  [6]; it seems it can trigger imports of translations from git repos
  via webhooks (focused on github, but we could do similar with a post
  job) and also propose updates directly to gerrit (using git-review;
  documentation is light on this feature but it is there).  It looks
  like (if I'm reading it right) we could move all configuration in a
  .weblate file per-repo, which suits our distributed model.

> My recommendation would be to leave it as it is and to decide how to
> proceed.

Overall, yeah, if it ain't broke, don't fix it :)

The other thing is, I noticed that weblate has hosted options.  If the
CI integration is such that it's importing via webhooks, and proposing
reviews then it seems like this is essentially an unprivileged app.
We have sunk a lot of collective time and resources into Zanata
deployment and we should probably do a real cost-benefit analysis once
we have some more insights.

-i


[1] https://github.com/translate/pootle/commits/master
[2] 
https://lists.fedoraproject.org/archives/list/tr...@lists.fedoraproject.org/thread/PZUT5ABMNDVYBD7OUBEGVXM7YVW6RZKQ/#4J7BJQWOJDEBACSHDIB6MYWEEXHES6CW
[3] https://github.com/WeblateOrg/docker
[4] https://docs.openstack.org/i18n/latest/infra.html
[5] https://docs.weblate.org/en/latest/wlc.html
[6] https://docs.weblate.org/en/latest/admin/continuous.html

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Meeting agenda for March 26, 2019

2019-03-25 Thread Ian Wienand
== Agenda for next meeting ==

* Announcements
** Clarkb remains on vacation March 25-28

* Actions from last meeting

* Specs approval

* Priority Efforts (Standing meeting agenda items. Please expand if you have 
subtopics.)
** 
[http://specs.openstack.org/openstack-infra/infra-specs/specs/task-tracker.html 
A Task Tracker for OpenStack]
** 
[http://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-management.html
 Update Config Management]
*** topic:puppet-4 and topic:update-cfg-mgmt
*** Zuul as CD engine
** OpenDev
*** https://storyboard.openstack.org/#!/story/2004627

* General topics
** PTG planning (clarkb 20190319 / ianw 20190326)
*** https://etherpad.openstack.org/2019-denver-ptg-infra-planning

* Open discussion

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zanata broken on Bionic

2019-03-25 Thread Ian Wienand
On Fri, Mar 15, 2019 at 11:01:44AM +0100, Andreas Jaeger wrote:
> Anybody remembers or can reach out to Zanata folks for help on
> fixing this for good, please?

From internal communication with people previously involved with
Zanata, it seems the team has disabanded and there is no current
support or, at this time, planned future development.  So
unfortunately it seems there are no "Zanata folks" at this point :(

It's a shame considering there has been significant work integrating
it into workflows, but I think we have to work under the assumption
upstream will remain inactive.

Falling back to "if it ain't broke" we can just continue with the
status quo with the proposal job running on Xenial and its java
versions for the forseeable future.  Should we reach a point
post-Xenial support lifespan, we could even consider a more limited
deployment of both the proposal job and server using containers etc.
Yes, this is how corporations end up in 2019 with RHEL5 servers
running Python 2.4 :)

Ultimately though, it's probably something the I18n team needs to
discuss and infra can help with any decisions made.

-i

[1] https://review.openstack.org/#/q/topic:zanata/translations

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [Release-job-failures] Release of openstack-infra/jenkins-job-builder failed

2018-12-10 Thread Ian Wienand
On Fri, Dec 07, 2018 at 12:02:00PM +0100, Thierry Carrez wrote:
> Looks like the readthedocs integration for JJB is misconfigured, causing the
> trigger-readthedocs-webhook to fail ?

Thanks for pointing this out.  After investigation it doesn't appear
to be misconfigured in any way, but it seems that RTD have started
enforcing the need for csrf tokens for the POST we use to notify it to
build.

This appears to be new behaviour, and possibly incorrectly applied
upstream (I'm struggling to think why it's necessary here).

I've filed

 https://github.com/rtfd/readthedocs.org/issues/4986

which hopefully can open a conversation about this.  Let's see what
comes of that...

*If* we have no choice but to move to token based authentication, I
did write the role to handle that.  But it involves every project
maintaining it's own secrets and us having to rework the jobs which is
not difficult but also not trivial.  So let's hope it doesn't come to
that ...

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Proposed changes to how we run our meeting

2018-11-20 Thread Ian Wienand
On Sun, Nov 18, 2018 at 11:09:29AM -0800, Clark Boylan wrote:
> Both ideas seem sound to me and I think we should try to implement
> them for the Infra team. I propose that we require agenda updates 24
> hours prior to the meeting start time and if there are no agenda
> updates we cancel the meeting. Curious to hear if others think this
> will be helpful and if 24 hours is enough lead time to be helpful.

My concern here is that we have standing items of priority tasks
updates that are essentially always there, and action item follow-up
from the prior meeting.  Personally I often find them very useful.

Having attended many in-person waffling weekly "status update"
meetings etc. I feel the infra one *is* very agenda focused.  I also
think there is never an expectation anyone is in the meeting; in fact
more so that we actively understand and expect people aren't there.

So I think it would be fine to send out the agenda 24 hours in
advance, and make a rule that new items post that will skip to the
next week, so that if there's nothing of particular interest people
can plan to skip.

This would involve managing the wiki page better IMO.  I always try to
tag my items with my name and date for discussion because clearing it
out is an asychronous operation.  What if we made the final thing in
the meeting after general discussion "reset agenda" so we have a
synchronisation point, and then clearly mark on the wiki page that
it's now for the next meeting date?

But I don't like that infra in general skips the meeting.  Apart from
the aforementioned standing items, people start thinking "oh my thing
is just little, I don't want to call a meeting for it" which is the
opposite of what we want to keep communication flowing.  For people
actively involved but remote like myself, it's a loss of a very
valuable hour to catch up on what's happening even with just the
regular updates.

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [requirements][vitrage][infra] SQLAlchemy-Utils version 0.33.6 breaks Vitrage gate

2018-10-23 Thread Ian Wienand
On Thu, Oct 18, 2018 at 01:17:13PM +, Jeremy Stanley wrote:
> It's been deleted (again) and the suspected fix approved so
> hopefully it won't recur.

Unfortunately the underlying issue is still a mystery.  It recurred
once after the suspected fix was merged [1], and despite trying to
replicate it mostly in-situ we could not duplicate the issue.

Another change [2] has made our builds use a modified pip [3] which
logs the sha256 hash of the .whl outputs.  If this reappears, we can
look at the logs and the final (corrupt) wheel and see if the problem
is coming from pip, or something after that as we copy the files.

If looking at hexdumps of zip files is your idea of a good time, there
are some details on the corruption in the comments of [2].  Any
suggestions welcome :) Also any corruption reports welcome too, and we
can continue investigation.

Thanks,

-i

[1] https://review.openstack.org/611444
[2] https://review.openstack.org/612234
[3] https://github.com/pypa/pip/pull/5908

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Control Plane Server Upgrade Sprint Planning

2018-09-18 Thread Ian Wienand
On Mon, Sep 17, 2018 at 04:09:03PM -0700, Clark Boylan wrote:
> October 15-19 may be our best week for this. Does that week work?

Post school-holidays here so SGTM :)

> Let me know if you are working on upgrading any servers/services and
> I will do what I can to help review changes and make that happen as
> well.

I will start on graphite.o.o as I now have some experience getting it
listening on ipv6 :) I think it's mostly package install and few
templating bits, ergo it might work well as ansible roles (i.e. don't
have to rewrite tricky logic).  I'll see how it starts to look ...

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [Release-job-failures] Tag of openstack/python-neutronclient failed

2018-09-10 Thread Ian Wienand
> On Mon, Sep 10, 2018 at 05:13:35AM +, z...@openstack.org wrote:
>> Build failed.
>>
>> - publish-openstack-releasenotes 
>> http://logs.openstack.org/c8/c89ca61fdcaf603a10750b289228b7f9a3597290/tag/publish-openstack-releasenotes/fbbd0fa/
>>  : FAILURE in 4m 03s

The line that is causing this is

  - Add OSC plugin support for the “Networking Service Function Chaining” ...

see if you can find the unicode :)

I did replicate it by mostly doing what the gate does; make a python2
vitualenv and install everything, then run

 ./env/bin/sphinx-build -a -E -W -d releasenotes/build/doctrees/ \
   -b html releasenotes/source/ releasenotes/build/html/

In the gate, it doesn't use "tox -e releasenotes" ... which passes
because it's python3 and everything is unicode already.

I think this is a reno problem, and I've proposed

  https://review.openstack.org/601432 Use unicode for debug string

Thanks

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [networking-odl][networking-bgpvpn][Telemetry] all requirement updates are currently blocked

2018-09-09 Thread Ian Wienand

On 09/10/2018 09:39 AM, Tony Breeds wrote:

Julien, Do you mind me arranging for at least the following versions to
be published to pypi?


For this particular case, I think our best approach is to have an
admin manually upload the tar & wheels from tarballs.openstack.org to
pypi.  All other options seem to be sub-optimal:

 - if we re-ran the release pipeline, I *think* it would all be
   idempotent and the publishing would happen, but there would be
   confusing duplicate release emails sent.

 - we could make a special "only-publish" template that avoids
   notification jobs; switch ceilometer to this, re-run the releases,
   then switch back.  urgh, especially if something goes wrong.

 - ceilometer could make "no-op" releases on each branch to trigger a
   fresh release & publish; but releases that essentially do nothing
   are I imagine an annoyance for users and distributors who track
   stable branches.

It would look like

  https://test.pypi.org/project/ceilometer/

The pypi hashes will all line up with the .asc files we publish, so we
know there's no funny business going on.

Thanks,

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all]-ish : Updates required for readthedocs publishers

2018-09-05 Thread Ian Wienand

On 09/06/2018 02:10 AM, Doug Hellmann wrote:

Those instructions and the ones linked at
https://docs.openstack.org/infra/openstack-zuul-jobs/project-templates.html#project_template-docs-on-readthedocs
say to "generate a web hook URL".


I think you got the correct answers, thanks Dmitry.  Note it is also
illustrated at

 https://imgur.com/a/Pp4LH31

Thanks

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all]-ish : Updates required for readthedocs publishers

2018-09-05 Thread Ian Wienand
Hello,

If you're interested in the projects mentioned below, you may have
noticed a new, failing, non-voting job
"your-readthedocs-job-requires-attention".  Spoiler alert: your
readthedocs job requires attention.  It's easy to miss because
publishing happens in the post pipeline and people don't often look
at the results of these jobs.

Please see the prior email on this

 http://lists.openstack.org/pipermail/openstack-dev/2018-August/132836.html

for what to do (if you read the failing job logs, it also points you
to this).

I (or #openstack-infra) can help, but only once the openstackci user
is given permissions to the RTD project by its current owner.

Thanks,

-i

The following projects have this job now:

openstack-infra/gear
openstack/airship-armada
openstack/almanach
openstack/ansible-role-bindep
openstack/ansible-role-cloud-launcher
openstack/ansible-role-diskimage-builder
openstack/ansible-role-cloud-fedmsg
openstack/ansible-role-cloud-gearman
openstack/ansible-role-jenkins-job-builder
openstack/ansible-role-logrotate
openstack/ansible-role-ngix
openstack/ansible-role-nodepool
openstack/ansible-role-openstacksdk
openstack/ansible-role-shade
openstack/ansible-role-ssh
openstack/ansible-role-sudoers
openstack/ansible-role-virtualenv
openstack/ansible-role-zookeeper
openstack/ansible-role-zuul
openstack/ara
openstack/bareon
openstack/bareon-allocator
openstack/bareon-api
openstack/bareon-ironic
openstack/browbeat
openstack/downpour
openstack/fuel-ccp
openstack/fuel-ccp-installer
openstack/fuel-noop-fixtures
openstack/ironic-staging-drivers
openstack/k8s-docker-suite-app-murano
openstack/kloudbuster
openstack/nerd-reviewer
openstack/networking-dpm
openstack/nova-dpm
openstack/ooi
openstack/os-faults
openstack/packetary
openstack/packetary-specs
openstack/performa
openstack/poppy
openstack/python-almanachclient
openstack/python-jenkins
openstack/rally
openstack/solar
openstack/sqlalchemy-migrate
openstack/stackalytics
openstack/surveil
openstack/swauth
openstack/turbo-hipster
openstack/virtualpdu
openstack/vmtp
openstack/windmill
openstack/yaql

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Launch node and the new bridge server

2018-08-28 Thread Ian Wienand

On 08/28/2018 09:48 AM, Clark Boylan wrote:

On Mon, Aug 27, 2018, at 4:21 PM, Clark Boylan wrote:
One quick new observation. launch-node.py does not install puppet at
all so the subsequent ansible runs on the newly launched instances
will fail when attempting to stop the puppet service (and will
continue on to fail to run puppet as well I think).


I think we should manage puppet on the hosts from Ansible; we did
discuss that we could just manually run
system-config:install_puppet.sh after launching the node; but while
that script does contain some useful things for getting various puppet
versions, it also carries a lot of extra cruft from years gone by.

I've proposed the roles to install puppet in [1].  This runs the roles
under Zuul for integration testing.

For the control-plane, we need a slight tweak to the inventory writer
to pass through groups [2] and then we can add the roles to the base
playbook [3].

Thanks,

-i

[1] https://review.openstack.org/596968
[2] https://review.openstack.org/596994
[3] https://review.openstack.org/596997

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Request to keep Fedora 28 images around until Fedora 30 comes out

2018-08-02 Thread Ian Wienand
On 08/03/2018 04:45 AM, Clark Boylan wrote:
> On Thu, Aug 2, 2018, at 9:57 AM, Alex Schultz wrote:
> As a note, Fedora 28 does come with python2.7. It is installed so
> that Zuul related ansible things can execute under python2 on the
> test nodes. There is the possibility that ansible's python3 support
> is working well enough that we could switch to it, but that
> requires testing and updates to software and images and config.

Python 3 only images are possible -- dib has a whole "dib-python"
thing for running python scripts inside the chroot in a distro &
version independent way -- but not with the pip-and-virtualenv element
setup we do as that drags in both [1].  You can go through a "git log"
of that element to see some of the many problems :)

OpenStack has always managed to tickle bugs in
pip/setuptools/virtualenv which is why we go to the effort of
installing the latest out-of-band.  This is not to say it couldn't be
reconsidered, especially for a distro like Fedora which packages
up-to-date packages.  But this would definitely be the first
port-of-call for anyone interested in going down that path in infra.

-i

[1] 
https://git.openstack.org/cgit/openstack/diskimage-builder/tree/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-install-pip

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[openstack-dev] [all][docs] ACTION REQUIRED for projects using readthedocs

2018-08-02 Thread Ian Wienand
Hello,

tl;dr : any projects using the "docs-on-readthedocs" job template
to trigger a build of their documentation in readthedocs needs to:

 1) add the "openstackci" user as a maintainer of the RTD project
 2) generate a webhook integration URL for the project via RTD
 3) provide the unique webhook ID value in the "rtd_webhook_id" project
variable

See

 
https://docs.openstack.org/infra/openstack-zuul-jobs/project-templates.html#project_template-docs-on-readthedocs

--

readthedocs has recently updated their API for triggering a
documentation build.  In the old API, anyone could POST to a known URL
for the project and it would trigger a build.  This end-point has
stopped responding and we now need to use an authenticated webhook to
trigger documentation builds.

Since this is only done in the post and release pipelines, projects
probably haven't had great feedback that current methods are failing
and this may be a surprise.  To check your publishing, you can go to
the zuul builds page [1] and filter by your project and the "post"
pipeline to find recent runs.

There is now some setup required which can only be undertaken by a
current maintainer of the RTD project.

In short; add the "openstackci" user as a maintainer, add a "generic
webhook" integration to the project, find the last bit of the URL from
that and put it in the project variable "rtd_webhook_id".

Luckily OpenStack infra keeps a team of highly skilled digital artists
on retainer and they have produced a handy visual guide available at

  https://imgur.com/a/Pp4LH31

Once the RTD project is setup, you must provide the webhook ID value
in your project variables.  This will look something like:

 - project:
templates:
  - docs-on-readthedocs
  - publish-to-pypi
vars:
  rtd_webhook_id: '12345'
check:
  jobs:
  ...

For actual examples; see pbrx [2] which keeps its config in tree, or
gerrit-dash-creator which has its configuration in project-config [3].

Happy to help if anyone is having issues, via mail or #openstack-infra

Thanks!

-i

p.s. You don't *have* to use the jobs from the docs-on-readthedocs
templates and hence add infra as a maintainer; you can setup your own
credentials with zuul secrets in tree and write your playbooks and
jobs to use the generic role [4].  We're always happy to discuss any
concerns.

[1] https://zuul.openstack.org/builds.html
[2] https://git.openstack.org/cgit/openstack/pbrx/tree/.zuul.yaml#n17
[3] 
https://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml
[4] https://zuul-ci.org/docs/zuul-jobs/roles.html#role-trigger-readthedocs

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all] Ongoing spam in Freenode IRC channels

2018-07-31 Thread Ian Wienand

Hello,

It seems freenode is currently receiving a lot of unsolicited traffic
across all channels.  The freenode team are aware [1] and doing their
best.

There are not really a lot of options.  We can set "+r" on channels
which means only nickserv registered users can join channels.  We have
traditionally avoided this, because it is yet one more barrier to
communication when many are already unfamiliar with IRC access.
However, having channels filled with irrelevant messages is also not
very accessible.

This is temporarily enabled in #openstack-infra for the time being, so
we can co-ordinate without interruption.

Thankfully AFAIK we have not needed an abuse policy on this before;
but I guess we are the point we need some sort of coordinated
response.

I'd suggest to start, people with an interest in a channel can request
+r from an IRC admin in #openstack-infra and we track it at [2]

Longer term ... suggestions welcome? :)

-i

[1] https://freenode.net/news/spambot-attack
[2] https://etherpad.openstack.org/p/freenode-plus-r-08-2018

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack lagging behind 2 major python versions: we need a Python 3.7 gate

2018-07-17 Thread Ian Wienand
On 07/13/2018 06:38 AM, Thomas Goirand wrote:
> Now, both Debian and Ubuntu have Python 3.7. Every package which I
> upload in Sid need to support that. Yet, OpenStack's CI is still
> lagging with Python 3.5.

OpenStack's CI is rather broad -- I'm going to assume we're talking
about whole-system devstack-ish based functional tests.  Yes most
testing is on Xenial and hence Python 3.5

We have Python 3.6 available via Bionic nodes.  I think current talk
is to look at mass-updates after the next release.  Such updates, from
history, are fairly disruptive.

> I'm aware that there's been some attempts in the OpenStack infra to
> have Debian Sid (which is probably the distribution getting the
> updates the faster).

We do not currently build Debian sid images, or mirror the unstable
repos or do wheel builds for Debian.  diskimage-builder also doesn't
test it in CI.  This is not to say it can't be done.

> If it cannot happen with Sid, then I don't know, choose another
> platform, and do the Python 3-latest gating...

Fedora has been consistently updated in OpenStack Infra for many
years.  IMO, and from my experience, six-monthly-ish updates are about
as frequent as can be practically handled.

The ideal is that a (say) Neutron dev gets a clear traceback from a
standard Python error in their change and happily fixes it.  The
reality is probably more like this developer gets a tempest
failure due to nova failing to boot a cirros image, stemming from a
detached volume due to a qemu bug that manifests due to a libvirt
update (I'm exaggerating, I know :).

That sort of deeply tangled platform issue always exists; however it
is armortised across the lifespan of the testing.  So several weeks
after we update all these key components, a random Neutron dev can be
pretty sure that submitting their change is actually testing *their*
change, and not really a defacto test of every other tangentially
related component.

A small, but real example; uwsgi wouldn't build with the gcc/glibc
combo on Fedora 28 for two months after its release until uwsgi's
2.0.17.1.  Fedora carried patches; but of course there were a lot
previously unconsidered assumptions in devstack around deployment that
made using the packaged versions difficult [1] (that stack still
hasn't received any reviews).

Nobody would claim diskimage-builder is the greatest thing ever, but
it does produce our customised images in a wide variety of formats
that runs in our very heterogeneous clouds.  It's very reactive -- we
don't know about package updates until they hit the distro, and
sometimes that breaks assumptions.  It's largely taken for granted in
our CI, but it takes a constant sustained effort across the infra team
to make sure we have somewhere to test.

I hear myself sounding negative, but I think it's a fundamental
problem.  You can't be dragging in the latest of everything AND expect
that you won't be constantly running off fixing weird things you never
even knew existed.  We can (and do) get to the bottom of these things,
but if the platform changes again before you've even fixed the current
issue, things start piling up.

If the job is constantly broken it gets ignored -- if a non-voting
job fails in the woods, does it make a sound? :)

> When this happens, moving faster with Python 3 versions will be
> mandatory for everyone, not only for fools like me who made the
> switch early.

This is a long way of saying that - IMO - the idea of putting out a
Debian sid image daily (to a lesser degree Buster images) and throwing
a project's devstack runs against it is unlikely to produce a good
problems-avoided : development-resources ratio.  However, prove me
wrong :)

If people would like to run their master against Fedora (note
OpenStack's stable branch lifespan is generally longer than a given
Fedora release is supported, so it is not much good there) you have
later packages, but still a fairly practical 6-month-ish stability
cadence.  I'm happy to help (some projects do already).

> 

With my rant done :) ... there's already discussion around multiple
python versions, containers, etc in [2].  While I'm reserved about the
idea of full platform functional tests, essentially having a
wide-variety of up-to-date tox environments using some of the methods
discussed there is, I think, a very practical way to be cow-catching
some of the bigger issues with Python version updates.  If we are to
expend resources, my 2c worth is that pushing in that direction gives
the best return on effort.

-i

[1] https://review.openstack.org/#/c/565923/
[2] http://lists.openstack.org/pipermail/openstack-dev/2018-July/132152.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[OpenStack-Infra] mirror.opensuse : AFS file name size issues

2018-06-17 Thread Ian Wienand
Hi,

It seems like the opensuse mirror has been on a bit of a growth spurt
[1].  Monitoring alerted me that the volume had not released for
several days, which lead me to look at the logs.

The rsync is failing with "File too large (27)" as it goes through
the tumbleweed sync.

As it turns out, AFS has a hard limit on the combined size of the
file names within a directory.  There are a couple of threads [2]
around from people who have found this out in pretty much the same way
as me ... when it starts failing :)

So you have 64k slots per directory, and file metadata+name takes up
slots per the formula:

 /* Find out how many entries are required to store a name. */
 int
 afs_dir_NameBlobs(char *name)
 {
 int i;
 i = strlen(name) + 1;
 return 1 + ((i + 15) >> 5);
 }

This means we have a problem with the large opensuse
tumbleweed/repo/oss/x86_64 directory, which has a lot of files with
quite long names.  Please, check my command/math, but if you run the
following command:

 $ rsync --list-only 
rsync://mirrors.rit.edu/opensuse/tumbleweed/repo/oss/x86_64/ \
   | awk '
 function slots(x) {
   i = length(x)+1;
   return 1 + rshift((i+15), 5)
 }
 { n += slots($5) }
 END {print n}
'

I come out with 82285, which is significantly more than the 64k slots
available.

I don't know what to do here, and it's really going to be up to people
interested in opensuse.  The most immediate thing is unnecessary
packages could be pruned from tumbleweed/repo/oss/x86_64/ during the
rsync.  Where unnecessary is in the eye of the beholder ... :)
See my note below, but it may have to be quite under 64k.

If we have any sway with upstream, maybe they could shard this
directory; similar to debian [3] or fedora [4] (that said, centos does
the same thing [5], but due to less packages and shorter names etc
it's only about 40% allocated).

Note that (open)AFS doesn't hard-link across directories, so some sort
of "rsync into smaller directories then hardlink tree" doesn't really
work.

Ideas, suggestions, reviews welcome :)

-- ps

There's an additional complication in that the slots fragment over
time and file names must be contiguous.  This means in practice, you do
get even less.

There is potential to "defrag" (I bet post Windows 95 you never
thought you'd hear that again :) by rebuilding the directories with
the salvager [6].  However, there are additional complications
again...

To simply do this we have to run a complete salvage of the *entire*
partition.  Although I have added "-salvagedirs" to afs01's
salvageserver (via [7]) in an attempt to do this for just one volume,
it turns out this is not obeyed until after [8] which is not in the
Xenial AFS version we use.  I really do not want to salvage all the
other volumes, most of which are huge.  The other option is to create
a new AFS server, move the volume to that so it's the only thing on
the partition, and run it there, then move it back [9].

I actually suspect an rm -rf * might also do it, and probably be faster
because we'd only move the data down once from the remote mirror,
rather than to a new server and back

But defragging is rather secondary if the directory is oversubscribed
anyway.

-i

[1] 
http://grafana02.openstack.org/d/ACtl1JSmz/afs?orgId=1=now-7d=now=28
[2] https://lists.openafs.org/pipermail/openafs-info/2016-July/041859.html
[3] http://mirror.iad.rax.openstack.org/debian/pool/main/
[4] 
http://mirror.iad.rax.openstack.org/fedora/releases/28/Everything/x86_64/os/Packages/
[5] http://mirror.iad.rax.openstack.org/centos/7/os/x86_64/Packages/
[6] http://docs.openafs.org/Reference/8/salvager.html
[7] https://docs.openstack.org/infra/system-config/afs.html#updating-settings
[8] https://gerrit.openafs.org/#/c/12461/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] afs02 r/o volume mirrors - resolved

2018-05-26 Thread Ian Wienand

On 05/25/2018 08:00 PM, Ian Wienand wrote:

I am now re-running the sync in a root screen on afs02 with -localauth
so it won't timeout.


I've now finished syncing back all R/O volumes on afs02, and the update
cron jobs have been running successfully.

Thanks,

-i


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] afs02 r/o volume mirrors - ongoing incident

2018-05-25 Thread Ian Wienand

On 05/24/2018 11:36 PM, Ian Wienand wrote:

Thanks to the help of Jeffrey Altman [1], we have managed to get
mirror.pypi starting to resync again.


And thanks to user error on my behalf, and identified by jeblair, in
the rush of all this I ran this under k5start on mirror-update,
instead of on one of the afs hosts with -localauth, so the ticket
timed out and the release failed.

---
root@mirror-update01:~# k5start -t -f /etc/afsadmin.keytab 
service/afsadmin -- vos release mirror.pypi

Kerberos initialization for service/afsad...@openstack.org

Release failed: rxk: authentication expired
Could not end transaction on a ro volume: rxk: authentication expired
 Could not update VLDB entry for volume 536870931
Failed to end transaction on the release clone 536870932
Could not release lock on the VLDB entry for volume 536870931
rxk: authentication expired
Error in vos release command.
rxk: authentication expired
---

If it is any consolation, it's the type of mistake you only make once :)

I am now re-running the sync in a root screen on afs02 with -localauth
so it won't timeout.  Expect it to finish about 20 hours from this
mail :/

Thanks,

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] afs02 r/o volume mirrors - ongoing incident

2018-05-24 Thread Ian Wienand
On 05/24/2018 08:45 PM, Ian Wienand wrote:
> On 05/24/2018 05:40 PM, Ian Wienand wrote:
>> In an effort to resolve this, the afs01 & 02 servers were restarted to
>> clear all old transactions, and for the affected mirrors I essentially
>> removed their read-only copies and re-added them with:
> 
> It seems this theory of removing the volumes and re-adding them is not
> sufficient to get things working; "vos release" is still failing.  I
> have sent a message to the openafs-devel list [1] with details and
> logs.

Thanks to the help of Jeffrey Altman [1], we have managed to get
mirror.pypi starting to resync again.  This is running in the root
screen on mirror-update.o.o (sorry, I forgot the "-v" on the command).

For reference, you can look at the transaction and see it receiving
data, e.g.

 root@afs02:/var/log/openafs# vos status -verbose -server localhost -localauth 
 Total transactions: 1
 --
 transaction: 62  created: Thu May 24 12:58:23 2018
 lastActiveTime: Thu May 24 12:58:23 2018
 volumeStatus: 
 volume: 536870932  partition: /vicepa  procedure: Restore
 packetRead: 2044135  lastReceiveTime: Thu May 24 13:33:17 2018
 packetSend: 1  lastSendTime: Thu May 24 13:33:17 2018
 --

Assuming this goes OK over the next few hours, that leaves
mirror.ubuntu and mirror.ubuntu-ports as the last two out-of-sync
mirrors.  As we do not want to run large releases in parallel, we can
tackle this when pypi is back in sync.

Thanks,

-i

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-05-24.log.html#t2018-05-24T12:57:39

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] afs02 r/o volume mirrors - ongoing incident

2018-05-24 Thread Ian Wienand
On 05/24/2018 05:40 PM, Ian Wienand wrote:
> In an effort to resolve this, the afs01 & 02 servers were restarted to
> clear all old transactions, and for the affected mirrors I essentially
> removed their read-only copies and re-added them with:

It seems this theory of removing the volumes and re-adding them is not
sufficient to get things working; "vos release" is still failing.  I
have sent a message to the openafs-devel list [1] with details and
logs.

We should probably see if any help can be gained from there.

If not, I'm starting to think that removing all R/O volumes, a "rm -rf
/vicepa/*" on afs02 and then starting the R/O mirrors again might be
an option?

If we critically need the mirrors updated, we can "vos remove" the R/O
volumes from any mirror and run an update just on afs01.  However note
that mirror-update.o.o is still in the emergency file and all cron
jobs stopped.

-i

[1] https://lists.openafs.org/pipermail/openafs-devel/2018-May/020491.html

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] afs02 r/o volume mirrors - ongoing incident

2018-05-24 Thread Ian Wienand
Hi,

We were notified of an issue around 22:45GMT with the volumes backing
the storage on afs02.dfw.o.o, which holds R/O mirrors for our AFS
volumes.

It seems that during this time there were a number of "vos release"s
in flight, or started, that ended up with volumes in a range of
unreliable states that made them un-releaseable (essentially halting
mirror updates).

Several of the volumes were recoverable with a manual "vos unlock" and
re-releasing the volume.  However, others were not.

To keep it short, fairly extensive debugging took place [2], but we
had corrupt volumes and deadlocked transactions between afs01 & afs02
with no reasonable solution.

In an effort to resolve this, the afs01 & 02 servers were restarted to
clear all old transactions, and for the affected mirrors I essentially
removed their read-only copies and re-added them with:

 k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos unlock $MIRROR
 k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos remove -server 
afs02.dfw.openstack.org -partition a -id $MIRROR.readonly
 k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos release -v $MIRROR
 k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos addsite -server 
afs02.dfw.openstack.org -partition a -id $MIRROR

The following volumes needed to be recovered

 mirror.fedora
 mirror.pypi
 mirror.ubuntu
 mirror.ubuntu-ports
 mirror.debian

(these are the largest repositories, and maybe it's no surprise that's
why they became corrupt?)

I have placed mirror-update.o.o in the emergency file, and commented
out all cron jobs on it.

Right now, I am running a script in a screen as the root user on
mirror-update.o.o to "vos release" these in sequence
(/root/release.sh).  Hopefully, this brings thing back into sync by
recreating the volumes.  If not, more debugging will be required :/

Please feel free to check in on this, otherwise I will update tomorrow
.au time

-i

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-05-23.log.html#t2018-05-23T22:43:46
[2] 
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-05-24.log.html#t2018-05-24T04:01:21

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [openstack-infra] How to take over a project?

2018-04-18 Thread Ian Wienand

On 04/19/2018 01:19 AM, Ian Y. Choi wrote:

By the way, since the networking-onos-release group has no neutron
release team group, I think infra team can help to include neutron
release team and neutron release team can help to create branches
for the repo if there is no reponse from current
networking-onos-release group member.


This seems sane and I've added neutron-release to
networking-onos-release.

I'm hesitant to give advice on branching within a project like neutron
as I'm sure there's stuff I'm not aware of; but members of the
neutron-release team should be able to get you going.

Thanks,

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-infra] How to take over a project?

2018-04-17 Thread Ian Wienand
On 04/17/2018 12:00 PM, Sangho Shin wrote:
> I would like to know how to take over an OpenStack project.  I am a
> committer of the networking-onos project
> (https://github.com/openstack/networking-onos
> ), and I would like to
> take over the project.

> The current maintainer (cc’d) has already agreed with that.

> Please let me know the process to take over (or change the
> maintainer of) the project.

Are you talking about the github project or the gerrit project?
Github is a read-only mirror of the project from gerrit.

You appear to already be a member of networking-onos-core [1] so you
have permissions to approve and reject changes.

> BTW, it looks like even the current maintainer cannot create a new
> branch of the codes. How can we get the authority to create a new
> branch?

Are you following something like [2]?

-i

[1] https://review.openstack.org/#/admin/groups/1001,members
[2] https://docs.openstack.org/infra/manual/drivers.html#feature-branches

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [devstack][infra] pip vs psutil

2018-04-16 Thread Ian Wienand

On 04/15/2018 09:32 PM, Gary Kotton wrote:

The gate is currently broken with
 https://launchpad.net/bugs/1763966. https://review.openstack.org/#/c/561427/
 Can unblock us in the short term. Any other ideas?


I'm thinking this is probably along the lines of the best idea.  I
left a fairly long comment on this in [1], but the root issue here is
that if a system package is created using distutils (rather than
setuptools) we end up with this problem with pip10.

That means the problem occurs when we a) try to overwrite a system
package and b) that package has been created using distutils.  This
means it is a small(er) subset of packages that cause this problem.
Ergo, our best option might be to see if we can avoid such packages on
a one-by-one basis, like here.

In some cases, we could just delete the .egg-info file, which is
approximately what was happening before anyway.

In this particular case, the psutils package is used by glance & the
peakmem tracker.  Under USE_PYTHON3, devstack's pip_install_gr only
installs the python3 library; however the peakmem tracker always uses
python2 -- leaing to missing library the failures in [2].  I have two
thoughts; either install for both python2 & 3 always [3] or make
peakmem tracker obey USE_PYTHON3 [4].  We can discuss the approach in
the reviews.

The other option is to move everything to virtualenv's, so we never
conflict with a system package, as suggested by clarkb [5] or
pabelanger [6].  These are more invasive changes, but also arguably
more correct.

Note diskimage-builder, and hence our image generation for some
platforms, is also broken.  Working on that in [7].

-i


[1] https://github.com/pypa/pip/issues/4805#issuecomment-340987536
[2] https://review.openstack.org/561427
[3] https://review.openstack.org/561524
[4] https://review.openstack.org/561525
[5] https://review.openstack.org/558930
[6] https://review.openstack.org/#/c/552939
[7] https://review.openstack.org/#/c/561479/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Selecting New Priority Effort(s)

2018-04-09 Thread Ian Wienand

On 04/06/2018 11:37 PM, Jens Harbott wrote:

I didn't intend to say that this was easier. My comment was related
to the efforts in https://review.openstack.org/558991 , which could
be avoided if we decided to deploy askbot on Xenial with
Ansible. The amount of work needed to perform the latter task would
not change, but we could skip the intermediate step, assuming that
we would start implementing 1) now instead of deciding to do it at a
later stage.


I disagree with this; having found a myriad of issues it's *still*
simpler that re-writing the whole thing IMO.

It doesn't matter, ansible, puppet, chef, bash scripts -- the
underlying problem is that we choose support libraries for postgres,
solr, celery, askbot, logs etc etc, get it to deploy, then forget
about it until the next LTS release 2 years later.  Of course the
whole world has moved on, but we're pinned to old versions of
everything and never tested on new platforms.

What *would* have helped is a rspec test that even just simply applies
the manifest on new platforms.  We have great infrastructure for these
tests; but most of our modules don't actually *run* anything (e.g.,
here's ethercalc and etherpad-lite issues too [1,2]).

These make it so much easier to collaborate; we can all see the result
of changes, link to logs, get input on what's going wrong, etc etc.

-i

[1] https://review.openstack.org/527822
[2] https://review.openstack.org/528130

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] Asking for ask.openstack.org

2018-04-04 Thread Ian Wienand

On 04/05/2018 10:23 AM, Zane Bitter wrote:

On 04/04/18 17:26, Jimmy McArthur wrote:
Here's the thing: email alerts. They're broken.


This is the type of thing we can fix if we know about it ... I will
contact you off-list because the last email to what I presume is you
went to an address that isn't what you've sent from here, but it was
accepted by the remote end.

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Asking for ask.openstack.org

2018-04-04 Thread Ian Wienand
On 04/05/2018 08:30 AM, Paul Belanger wrote:
> We likely need to reduce the number of days we retain database
> backups / http logs or look to attach a volume to increase storage.

We've long had problems with this host and I've looked at it before
[1].  It often drops out.

It seems there's enough interest we should dive a bit deeper.  Here's
what I've found out:

askbot
--

Of the askbot site, it seems under control, except for an unbounded
session log file.  Proposed [2]

 root@ask:/srv# du -hs *
 2.0G   askbot-site
 579M   dist

overall
---

The major consumer is /var; where we've got

 3.9G   log
 5.9G   backups
 9.4G   lib

backups
---

The backup seem under control at least; we're rotating them out and we
keep 10, and the size is pretty consistently 500mb:

 root@ask:/var/backups/pgsql_backups# ls -lh
 total 5.9G
 -rw-r--r-- 1 root root 599M Apr  5 00:03 askbotdb.sql.gz
 -rw-r--r-- 1 root root 598M Apr  4 00:03 askbotdb.sql.gz.1
 ...

We could reduce the backup rotations to just one if we like -- the
server is backed up nightly via bup, so at any point we can get
previous dumps from there.  bup should de-duplicate everything, but
still, it's probably not necessary.

The db directory was sitting at ~9gb

 root@ask:/var/lib/postgresql# du -hs
 8.9G   .

AFAICT, it seems like the autovacuum is running OK on the busy tables

 askbotdb=# select relname,last_vacuum, last_autovacuum, last_analyze, 
last_autoanalyze from pg_stat_user_tables where last_autovacuum is not NULL;
  relname  | last_vacuum |last_autovacuum| 
last_analyze  |   last_autoanalyze
 
--+-+---+---+---
  django_session   | | 2018-04-02 17:29:48.329915+00 | 2018-04-05 
02:18:39.300126+00 | 2018-04-05 00:11:23.456602+00
  askbot_badgedata | | 2018-04-04 07:19:21.357461+00 |  
 | 2018-04-04 07:18:16.201376+00
  askbot_thread| | 2018-04-04 16:24:45.124492+00 |  
 | 2018-04-04 20:32:25.845164+00
  auth_message | | 2018-04-04 12:29:24.273651+00 | 2018-04-05 
02:18:07.633781+00 | 2018-04-04 21:26:38.178586+00
  djkombu_message  | | 2018-04-05 02:11:50.186631+00 |  
 | 2018-04-05 02:14:45.22926+00

Out of interest I did run a manual

 su - postgres -c "vacuumdb --all --full --analyze"

We dropped something

 root@ask:/var/lib/postgresql# du -hs
 8.9G   .
 (after)
 5.8G   

I installed pg_activity and watched for a while; nothing seemed to be
really stressing it.

Ergo, I'm not sure if there's much to do in the db layers.

logs


This leaves the logs

 1.1G   jetty
 2.9G   apache2

The jetty logs are cleaned regularly.  I think they could be made more
quiet, but they seem to be bounded.

Apache logs are rotated but never cleaned up.  Surely logs from 2015
aren't useful.  Proposed [3]

Random offline
--

[3] is an example of a user reporting the site was offline.  Looking
at the logs, it seems that puppet found httpd not running at 07:14 and
restarted it:

 Apr  4 07:14:40 ask puppet-user[20737]: (Scope(Class[Postgresql::Server])) 
Passing "version" to postgresql::server is deprecated; please use 
postgresql::globals instead.
 Apr  4 07:14:42 ask puppet-user[20737]: Compiled catalog for ask.openstack.org 
in environment production in 4.59 seconds
 Apr  4 07:14:44 ask crontab[20987]: (root) LIST (root)
 Apr  4 07:14:49 ask puppet-user[20737]: 
(/Stage[main]/Httpd/Service[httpd]/ensure) ensure changed 'stopped' to 'running'
 Apr  4 07:14:54 ask puppet-user[20737]: Finished catalog run in 10.43 seconds

Which first explains why when I looked, it seemed OK.  Checking the
apache logs we have:

 [Wed Apr 04 07:01:08.144746 2018] [:error] [pid 12491:tid 140439253419776] 
[remote 176.233.126.142:43414] mod_wsgi (pid=12491): Exception occurred 
processing WSGI script '/srv/askbot-site/config/django.wsgi'.
 [Wed Apr 04 07:01:08.144870 2018] [:error] [pid 12491:tid 140439253419776] 
[remote 176.233.126.142:43414] IOError: failed to write data
 ... more until ...
 [Wed Apr 04 07:15:58.270180 2018] [:error] [pid 17060:tid 140439253419776] 
[remote 176.233.126.142:43414] mod_wsgi (pid=17060): Exception occurred 
processing WSGI script '/srv/askbot-site/config/django.wsgi'.
 [Wed Apr 04 07:15:58.270303 2018] [:error] [pid 17060:tid 140439253419776] 
[remote 176.233.126.142:43414] IOError: failed to write data

and the restart logged

 [Wed Apr 04 07:14:48.912626 2018] [core:warn] [pid 21247:tid 140439370192768] 
AH00098: pid file /var/run/apache2/apache2.pid overwritten -- Unclean shutdown 
of previous Apache run?
 [Wed Apr 04 07:14:48.913548 2018] [mpm_event:notice] [pid 21247:tid 
140439370192768] AH00489: Apache/2.4.7 (Ubuntu) OpenSSL/1.0.1f mod_wsgi/3.4 
Python/2.7.6 configured -- resuming normal operations
 [Wed Apr 04 

Re: [OpenStack-Infra] Problems setting up my own OpenStack Infrastructure

2018-04-04 Thread Ian Wienand

   * Puppet doesn't create the //var/log/nodepool///images /log directory


Note that since [1] the builder log output changed; previously it went
through python logging into the directory you mention, now it is
written into log files directly in /var/log/nodepool/builds (by
default)


   * The command /service //nodepool-builder//start /seems to//start a
 nodepool process that immediately aborts/


You may be seeing the result of a bad logging configuration file.  In
this case, the daemonise happens correctly (so systemd thinks it
worked) but it crashes soon after, but before any useful logging is
captured. I have a change out for that in [2] (reviews appreciated :)

Let me see how far I can get on my own. Thanks much for the offer to
tutor me on the IRC; I will watch out for you in my morning. Our
time difference is between 13 hours (EDT) and 16 hours (PDT) if you
are located in the continental US, i.e. 7pm EDT is 8am next day here
in Japan.


FWIW there are a couple of us in APAC who are happy to help too.  IRC
will always be the most immediate way however :)

-i

[1] https://review.openstack.org/#/c/542386/
[2] https://review.openstack.org/#/c/547889/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Options for logstash of ansible tasks

2018-03-28 Thread Ian Wienand
On 03/28/2018 11:30 AM, James E. Blair wrote:
> As soon as I say that, it makes me think that the solution to this
> really should be in the log processor.  Whether it's a grok filter, or
> just us parsing the lines looking for task start/stop -- that's where we
> can associate the extra data with every line from a task.  We can even
> generate a uuid right there in the log processor.

I'd agree the logstash level is probably where to do this.  How to
acheive that ...

In trying to bootstrap myself on the internals of this, one thing I've
found is that the multi-line filter [1] is deprecated for the
multiline codec plugin [2].

We make extensive use of this deprecated filter [3].  It's not clear
how we can go about migrating away from it?  The input is coming in as
"json_lines" as basically a json-dict -- with a tag that we then use
different multi-line matches for.

From what I can tell, it seems like the work of dealing with
multiple-lines has actually largley been put into filebeat [5] which
is analagous to our logstash-workers (it feeds the files into
logstash).

Ergo, do we have to add multi-line support to the logstash-pipeline,
so that events sent into logstash are already bundled together?

-i

[1] https://www.elastic.co/guide/en/logstash/2.4/plugins-filters-multiline.html
[2] 
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html
[3] 
https://git.openstack.org/cgit/openstack-infra/logstash-filters/tree/filters/openstack-filters.conf
[4] 
http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/logstash/input.conf.erb
[5] 
https://www.elastic.co/guide/en/beats/filebeat/current/multiline-examples.html

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Options for logstash of ansible tasks

2018-03-27 Thread Ian Wienand
I wanted to query for a failing ansible task; specifically what would
appear in the console log as

 2018-03-27 15:07:49.294630 | 
 2018-03-27 15:07:49.295143 | TASK [configure-unbound : Check for IPv6]
 2018-03-27 15:07:49.368062 | primary | skipping: Conditional result was False
 2018-03-27 15:07:49.400755 | 

While I can do

 message:"configure-unbound : Check for IPv6"

I want to correlate that with a result, looking also for the matching

 skipping: Conditional result was False

as the result of the task.  AFAICT, there is no way in kibana to
enforce a match on consecutive lines like this (as it has no concept
they are consecutive).

I considered a few things.  We could conceivably group everything
between "TASK" and a blank " | " into a single entry with a multiline
filter.  It was pointed out that this would make, for example, the
entire devstack log as a single entry, however.

The closest other thing I could find was "aggregate" [1]; but this
relies on having a unique task-id to group things together with.
Ansible doesn't give us this in the logs and AFAIK doesn't have a
concept of a uuid for tasks.

So I'm at a bit of a loss as to how we could effectively index ansible
tasks so we can determine the intermediate values or results of
individual tasks?  Any ideas?

-i

[1] 
https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Adding new etcd binaries to tarballs.o.o

2018-03-27 Thread Ian Wienand

On 03/28/2018 01:04 AM, Jeremy Stanley wrote:

I would be remiss if I failed to remind people that the *manually*
installed etcd release there was supposed to be a one-time stop-gap,
and we were promised it would be followed shortly with some sort of
job which made updating it not-manual. We're coming up on a year and
it looks like people have given in and manually added newer etcd
releases at least once since. If this file were important to
testing, I'd have expected someone to find time to take care of it
so that we don't have to. If that effort has been abandoned by the
people who originally convinced us to implement this "temporary"
workaround, we should remove it until it can be supported properly.


In reality we did fix it, as described with the
use-from-cache-or-download changes in the prior mail.  I even just
realised I submitted and forgot about [1] which never got reviewed to
remove the tarballs.o.o pointer -- that setting then got copied into
the new devstack zuulv3 jobs [2].

Anyway, we got there in the end :) I'll add to my todo list to clear
them from tarballs.o.o once this settles out.

-i

[1] https://review.openstack.org/#/c/508022/
[2] https://review.openstack.org/#/c/554977/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [tripleo][infra][dib] Gate "out of disk" errors and diskimage-builder 2.12.0

2018-03-21 Thread Ian Wienand

On 03/21/2018 03:39 PM, Ian Wienand wrote:

We will prepare dib 2.12.1 with the fix.  As usual there are
complications, since the dib gate is broken due to unrelated triple-o
issues [2].  In the mean time, probably avoid 2.12.0 if you can.



[2] https://review.openstack.org/554705


Since we have having issues getting this verified due to some
instability in the tripleo gate, I've proposed a temporary removal of
the jobs for dib in [1].

[1] https://review.openstack.org/555037

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [infra][dib] Gate "out of disk" errors and diskimage-builder 2.12.0

2018-03-20 Thread Ian Wienand

Hi,

We had a small issue with dib's 2.12.0 release that means it creates
the root partition with the wrong partition type [1].  The result is
that a very old check in sfdisk fails, and growpart then can not
expand the disk -- which means you may have seen jobs that usually
work fine run out of disk space.

This slipped by because our functional testing doesn't test growpart;
an oversight we will correct in due course.

The bad images should have been removed, so a recheck should work.

We will prepare dib 2.12.1 with the fix.  As usual there are
complications, since the dib gate is broken due to unrelated triple-o
issues [2].  In the mean time, probably avoid 2.12.0 if you can.

Thanks,

-i

[1] https://review.openstack.org/554771
[2] https://review.openstack.org/554705

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [infra][all] Anyone using our ubuntu-mariadb mirror?

2018-03-14 Thread Ian Wienand
Hello,

We discovered an issue with our mariadb package mirroring that
suggests it hasn't been updating for some time.

This would be packages from

 http://mirror.X.Y.openstack.org/ubuntu-mariadb/10.<1|2>

This was originally added in [1].  AFAICT from codesearch, it is
currently unused.  We export the top-level directory in the mirror
config scripts as NODEPOOL_MARIADB_MIRROR, which is not referenced in
any jobs [2], and I couldn't find anything setting up apt repos
pointing to it.

Thus since it's not updating and nothing seems to reference it, I am
going to assume it is unused and remove it next week.  If not, please
respond and we can organise a fix.

-i

[1] https://review.openstack.org/#/c/307831/
[2] 
http://codesearch.openstack.org/?q=NODEPOOL_MARIADB_MIRROR=nope==

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] [infra][nova] Corrupt nova-specs repo

2018-03-04 Thread Ian Wienand
On 06/30/2017 04:11 PM, Ian Wienand wrote:
> Unfortunately it seems the nova-specs repo has undergone some
> corruption, currently manifesting itself in an inability to be pushed
> to github for replication.

We haven't cleaned this up, due to wanting to do it during a rename
transition which hasn't happened yet due to zuulv3 rollout.

We had reports that github replication was not working.  Upon checking
the queue, nova-specs was suspicious.

...
07141063  Mar-02 08:04  (retry 3810) [d7122c96] push 
g...@github.com:openstack/nova-specs.git
4e27c57e waiting  Mar-02 08:12  [ee1b1935] push 
g...@github.com:openstack/networking-bagpipe.git
... so on ...

Checking out the logs, nova-specs tries to push itself and fails
constantly, per the previous mail.  However, usually we get an error
and things continue on; e.g.

[2018-03-02 08:04:56,439] [d7122c96] Cannot replicate to 
g...@github.com:openstack/nova-specs.git
org.eclipse.jgit.errors.TransportException: 
g...@github.com:openstack/nova-specs.git: error occurred during unpacking on 
the remote end: index-pack abnormal exit

Something seems to have happened at

[2018-03-02 08:05:58,065] [d7122c96] Push to 
g...@github.com:openstack/nova-specs.git references:

Becuase this never returned an error, or seemingly at all.  From that
point, no more attempts were made by the replication thread(s) to push
to github; jobs were queued but nothing happened.  I killed that task,
but no progress appeared to be made and the replication queue
continued to climb.  I couldn't find any other useful messages in the
logs; but they would be around that time if they were there.

I've restarted gerrit and replication appears to be moving again.  I'm
thinking maybe we should attempt to fix this separate to renames,
because at a minimum it makes debugging quite hard as it floods the
logs.  I'll bring it up in this week's meeting.

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[openstack-dev] [devstack] Jens Harbott added to core

2018-03-04 Thread Ian Wienand
Hello,

Jens Harbott (frickler) has agreed to take on core responsibilities in
devstack, so feel free to bug him about reviews :)

We have also added the members of qa-release in directly to
devstack-core, just for visibility (they already had permissions via
qa-release -> devstack-release -> devstack-core).

We have also added devstack-core as grenade core to hopefully expand
coverage there.

---

Always feel free to give a gentle ping on reviews that don't seem have
received sufficient attention.

But please also take a few minutes to compose a commit message!  I
think sometimes devs have been deep in the weeds with their cool
change and devstack requires just a few tweaks.  It's easy to forget
not all reviewers may have this same context.  A couple of
well-crafted sentences can avoid pulling projects and "git blame"
archaeological digs, which gets everything going faster!

Thanks,

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Zanata upgrade to version 4

2018-02-28 Thread Ian Wienand

On 02/27/2018 09:32 PM, Frank Kloeker wrote:

We will take the chance now to upgrade our translation platform to a
new version.


This has been completed and translate.o.o is now running 4.3.3.  For
any issues reply, or catch any infra-root in #openstack-infra

Thanks

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Adding ARM64 cloud to infra

2018-02-22 Thread Ian Wienand
On 02/02/2018 05:15 PM, Ian Wienand wrote:
> - Once that is done, it should be straight forward to add a
>nodepool-builder in the cloud and have it build images, and zuul
>should be able to launch them just like any other node (famous last
>words).

This roughly turned out to be correct :)

In short, we now have ready xenial arm64 based nodes.  If you request
an ubuntu-xenial-arm64 node it should "just work"

There are some caveats:

 - I have manually installed a diskimage-builder with the changes from
   [1] downwards onto nb03.openstack.org.  These need to be finalised
   and a release tagged before we can remove nb03 from the emergency
   file (just means, don't run puppet on it).  Reviews welcome!

 - I want to merge [2] and related changes to expose the image build
   logs, and also the webapp end-points so we can monitor active
   nodes, etc.  It will take some baby-sitting so I plan on doing this
   next week.

 - We have mirror.cn1.linaro.openstack.org, but it's not mirroring
   anything that useful for arm64.  We need to sort out mirroring of
   ubuntu ports, maybe some wheel builds, etc.

 - There's currently capacity for 8 nodes.  So please take that into
   account when adding jobs.

Everything seems in good shape at the moment.  For posterity, here is
the first ever arm64 ready node:

 nodepool@nl03:/var/log/nodepool$ nodepool list | grep arm64
 | 0002683657 | linaro-cn1 | ubuntu-xenial-arm64 | 
c7bb6da6-52e5-4aab-88f1-ec0f1b392a0c | 211.148.24.200  |
| ready| 00:00:03:43 | unlocked |

:)

-i

[1] https://review.openstack.org/547161
[2] https://review.openstack.org/543671

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [nodepool] Restricting images to specific nodepool builders

2018-02-19 Thread Ian Wienand

On 02/20/2018 02:23 AM, Paul Belanger wrote:

Why not just split the builder configuration file? I don't see a
need to add code to do this.


I'm happy with this; I was just coming at it from an angle of not
splitting the config file, but KISS :)


I did submit support homing diskimage builds to specific builder[2] a while
back, which is more inline with what ianw is asking. This allows us to assign
images to builders, if set.



[2] https://review.openstack.org/461239/


Only comment on this is that I think it might be better to avoid
putting specific hostnames in there directly; but rather add meta-data
to diskimage configurations describing the features they need on the
builder, and have the builder then only choose those builds it knows
it can do.  Feels more natural for the message-queue/scale-out type
environment where we can add/drop hosts at will.

We've two real examples to inform design; needing the Xenial build
host when all the others were trusty, and now the arm64 based ones.

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] [nodepool] Restricting images to specific nodepool builders

2018-02-18 Thread Ian Wienand
Hi,

How should we go about restricting certain image builds to specific
nodepool builder instances?  My immediate issue is with ARM64 image
builds, which I only want to happen on a builder hosted in an ARM64
cloud.

Currently, the builders go through the image list and check "is the
existing image missing or too old, if so, build" [1].  Additionally,
all builders share a configuration file [2]; so builders don't know
"who they are".

I'd propose we add an arbitrary tag/match system so that builders can
pickup only those builds they mark themselves capable of building?

e.g. diskimages would specify required builder tags similar to:

---
diskimages:
  - name: arm64-ubuntu-xenial
elements:
  - block-device-efi
  - vm
  - ubuntu-minimal
  ...
env-vars:
  TMPDIR: /opt/dib_tmp
  DIB_CHECKSUM: '1'
  ...
builder-requires:
  architecture: arm64
---

The nodepool.yaml would grow another section similar:

---
builder-provides:
  architecture: arm64
  something_else_unique_about_this_buidler: true
---

For OpenStack, we would template this section in the config file via
puppet in [2], ensuring above that only our theoretical ARM64 build
machine had that section in it's config.

The nodepool-buidler build loop can then check that its
builder-provides section has all the tags specified in an image's
"builder-requires" section before deciding to start building.

Thoughts welcome :)

-i

[1] 
https://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/builder.py#n607
[2] 
https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/nodepool.yaml

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [Release-job-failures] release-post job for openstack/releases failed

2018-02-08 Thread Ian Wienand
On 02/09/2018 02:35 PM, Tony Breeds wrote:
>> - tag-releases 
>> http://logs.openstack.org/bd/bd802368fe546a891b89f78fec89d3ea9964c155/release-post/tag-releases/ffc68e7/
>>  : TIMED_OUT in 32m 19s
>> - publish-static publish-static : SKIPPED
> 
> Can we please re-run these jobs.

Done with [1]

-i

[1] 
http://logs.openstack.org/bd/bd802368fe546a891b89f78fec89d3ea9964c155/release-post/tag-releases/2cdfded/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Adding ARM64 cloud to infra

2018-02-01 Thread Ian Wienand

Hi,

A quick status update on the integration of the Linaro aarch64 cloud

- Everything is integrated into the system-config cloud-launcher bits,
  so all auth tokens are in place, keys are deploying, etc.

- I've started with a mirror.  So far only a minor change to puppet
  required for the ports sources list [1].  It's a bit bespoke at the
  moment but up as mirror.cn1.linaro.openstack.org.

- AFS is not supported out-of-the-box.  There is a series at [2] that
  I've been working on today, with some success.  I have custom
  packages at [3] which seem to work and can see our mirror
  directories.  I plan to puppet this in for our immediate needs, and
  keep working to get it integrated properly upstream.

- For building images, we are getting closer.  The series at [4] is
  still very WIP but can produce a working gpt+efi image.  I don't see
  any real blockers there; work will continue to make sure we get the
  interface if not perfect, at least not something we totally regret
  later :)

- Once that is done, it should be straight forward to add a
  nodepool-builder in the cloud and have it build images, and zuul
  should be able to launch them just like any other node (famous last
  words).

Thanks all,

-i

[1] https://review.openstack.org/539083
[2] https://gerrit.openafs.org/11940
[3] https://tarballs.openstack.org/package-afs-aarch64/
[4] https://review.openstack.org/#/c/539731/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Adding ARM64 cloud to infra

2018-01-18 Thread Ian Wienand

On 01/13/2018 03:54 AM, Marcin Juszkiewicz wrote:

UEFI expects GPT and DIB is completely not prepared for it.


I feel like we've made good progress on this part, with sufficient
GPT support in [1] to get started on the EFI part

... which is obviously where the magic is here.  This is my first
rodeo building something that boots on aarch64, but not yours I've
noticed :)

I've started writing some notes at [2] and anyone is welcome to edit,
expand, add notes on testing etc etc.  I've been reading through the
cirros implementation and have more of a handle on it; I'm guessing
we'll need to do something similar in taking distro grub packages and
put them in place manually.  Any notes on testing very welcome :)

Cheers,

-i

[1] https://review.openstack.org/#/c/533490/
[2] https://etherpad.openstack.org/p/dib-efi

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Adding ARM64 cloud to infra

2018-01-15 Thread Ian Wienand

On 01/16/2018 12:11 AM, Frank Jansen wrote:

do you have any insight into the availability of a physical
environment for the ARM64 cloud?



I’m curious, as there may be a need for downstream testing, which I
would assume will want to make use of our existing OSP CI framework.


Sorry, not 100% sure what you mean here?  I think the theory is that
this would be an ARM64 based cloud attached to OpenStack infra and
thus run any jobs infra could ...

-i


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Adding ARM64 cloud to infra

2018-01-15 Thread Ian Wienand

On 01/13/2018 01:26 PM, Ian Wienand wrote:

In terms of implementation, since you've already looked, I think
essentially diskimage_builder/block_device/level1.py create() will
need some moderate re-factoring to call a gpt implementation in
response to a gpt label, which could translate self.partitions into a
format for calling parted via our existing exec_sudo.



bringing up a sample config and test, then working backwards from what
calls we expect to see


I've started down this path with

 https://review.openstack.org/#/c/533490/

... still very wip

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Adding ARM64 cloud to infra

2018-01-12 Thread Ian Wienand
On 01/13/2018 05:01 AM, Jeremy Stanley wrote:
> On 2018-01-12 17:54:20 +0100 (+0100), Marcin Juszkiewicz wrote:
> [...]
>> UEFI expects GPT and DIB is completely not prepared for it. I made
>> block-layout-arm64.yaml file and got it used just to see "sorry,
>> mbr expected" message.
> 
> I concur. It looks like the DIB team would welcome work toward GPT
> support based on the label entry at
> https://docs.openstack.org/diskimage-builder/latest/user_guide/building_an_image.html#module-partitioning
> and I find https://bugzilla.redhat.com/show_bug.cgi?id=1488557
> suggesting there's probably also interest within Red Hat for it as
> well.

Yes, it would be welcome.  So far it's been a bit of a "nice to have"
which has kept it low priority, but a concrete user could help our
focus here.

>> You have whole Python class to create MBR bit by bit when few
>> calls to 'sfdisk/gdisk' shell commands do the same.
> 
> Well, the comments at
> http://git.openstack.org/cgit/openstack/diskimage-builder/tree/diskimage_builder/block_device/level1/mbr.py?id=5d5fa06#n28
> make some attempt at explaining why it doesn't just do that instead
> (at least as of ~7 months ago?).

I agree with the broad argument of this sentiment; that writing a
binary-level GPT implementation is out of scope for dib (and the
existing MBR one is, with hindsight, something I would have pushed
back on more).

dib-block-device being in python is a double edged sword -- on the one
hand it's harder to drop in a few lines like in shell, but on the
other hand it has proper data structures, unit testing, logging and
config-reading abilities -- things that all are rather ugly, or get
lost with shell.  The code is not perfect, but doing more things like
[1,2] to enhance and better use libraries will help everyone (and
notice that's making it easier to translate directly to parted, no
coincidence :)

The GPL linkage issue, as described in the code, prevents us doing the
obvious thing and calling directly via python.  But I believe will we
be OK just making system() calls to parted to configure GPT;
especially given the clearly modular nature of it all.

In terms of implementation, since you've already looked, I think
essentially diskimage_builder/block_device/level1.py create() will
need some moderate re-factoring to call a gpt implementation in
response to a gpt label, which could translate self.partitions into a
format for calling parted via our existing exec_sudo.

This is highly amenable to a test-driven development scenario as we
have some pretty good existing unit tests for various parts of the
partitioning to template from (for example, tests/test_lvm.py).  So
bringing up a sample config and test, then working backwards from what
calls we expect to see is probably a great way to start.  Even if you
just want to provide some (pseudo)shell examples based on your
experience and any thoughts on the yaml config files it would be
helpful.

--

I try to run the meetings described in [3] if there is anything on the
agenda.  The cadence is probably not appropriate for this, we can do
much better via mail here, or #openstack-dib in IRC.  I hope we can
collaborate in a positive way; as I mentioned I think as a first step
we'd be best working backwards from what we expect to see in terms of
configuration, partition layout and parted calls.

Thanks,

-i

[1] https://review.openstack.org/#/c/503574/
[2] https://review.openstack.org/#/c/503572/
[3] https://wiki.openstack.org/wiki/Meetings/diskimage-builder

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [qa][requirements] CentOS libvirt versus newton/ocata libvirt-python

2018-01-11 Thread Ian Wienand

On 01/12/2018 02:53 PM, Matthew Thode wrote:

First, about newton, it's dead (2017-10-11).


Yeah, there were a few opt-outs, which is why I think devstack still
runs it.  Not worth a lot of effort.


Next, about ocata, it looks like it can support newer libvirt, but
just because a distro updated a library doesn't mean we have to
update.  IIRC, for ubuntu they use cloud-archives to get the right
version of libvirt, does something like that exist for
centos/redhat?


Well cloud-archives is ports of more recent things backwards, whereas
I think we're in a situation of having too recent libraries in the
base platform.  The CentOS 7.3 v 7.4 situation is a little more subtle
than Trusty v Xenial, say, but fundamentally the same I guess.  The
answer may be "Ocata not supported on 7.4".

p.s. I hope I'm understanding the python-libvirt compat story
correctly.  AIUI any newer python-binding release will build against
older versions of libvirt.  But an old version of python-libvirt may
not build against a newer release of the C libraries?

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [qa][requirements] CentOS libvirt versus newton/ocata libvirt-python

2018-01-11 Thread Ian Wienand
Hi,

So I guess since CentOS included libvirt 3.2 (7-1708, or around RHEL
7.4), it's been incompatible with libvirt-python requirements of 2.1.0
in newton [1] and 2.5.0 in ocata [2] (pike, at 3.5.0, works).

Do we want to do anything about this?  I can think of several options

* bump the libvirt-python versions on older branches

* Create an older centos image (can't imagine we have the person
  bandwidth to maintain this)

* Hack something in devstack (seems rather pointless to test
  something so far outside deployments).

* Turn off CentOS testing for old devstack branches

None are particularly appealing...

(I'm sorry if this has been discussed, I have great déjà vu about it,
maybe we were talking about it at summit or something).

-i

[1] 
http://logs.openstack.org/48/531248/2/check/legacy-tempest-dsvm-neutron-full-centos-7/80fa903/logs/devstacklog.txt.gz#_2018-01-09_05_14_40_960
[2] 
http://logs.openstack.org/50/531250/2/check/legacy-tempest-dsvm-neutron-full-centos-7/1c711f5/logs/devstacklog.txt.gz#_2018-01-09_20_43_08_833

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Adding ARM64 cloud to infra

2018-01-11 Thread Ian Wienand

On 01/10/2018 08:41 PM, Gema Gomez wrote:

1. Control-plane project that will host a nodepool builder with 8 vCPUs,
8 GB RAM, 1TB storage on a Cinder volume for the image building scratch
space.

Does this mean you're planning on using diskimage-builder to produce
the images to run tests on?  I've seen occasional ARM things come by,
but of course diskimage-builder doesn't have CI for it (yet :) so it's
status is probably "unknown".

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] ze04 & #532575

2018-01-10 Thread Ian Wienand
Hi,

To avoid you having to pull apart the logs starting ~ [1], we
determined that ze04.o.o was externally rebooted at 01:00UTC (there is
a rather weird support ticket which you can look at, which is assigned
to a rackspace employee but in our queue, saying the host became
unresponsive).

Unfortunately that left a bunch of jobs orphaned and necessitated a
restart of zuul.

However, recent changes to not run the executor as root [2] were thus
partially rolled out on ze04 as it came up after reboot.  As a
consequence when the host came back up the executor was running as
root with an invalid finger server.

The executor on ze04 has been stopped, and the host placed in the
emergency file to avoid it coming back.  There are now some in-flight
patches to complete this transition, which will need to be staged a
bit more manually.

The other executors have been left as is, based on the KISS theory
they shouldn't restart and pick up the code until this has been dealt
with.

Thanks,

-i


[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20
[2] https://review.openstack.org/#/c/532575/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [requirements][vitrage] Networkx version 2.0

2018-01-07 Thread Ian Wienand

On 12/21/2017 02:51 AM, Afek, Ifat (Nokia - IL/Kfar Sava) wrote:

There is an open bug in launchpad about the new release of Networkx
2.0, that is backward incompatible with versions 1.x [1].


From diskimage-builder's POV, we can pretty much switch whenever
ready, just a matter of merging [2] after constraints is bumped.

It's kind of annoying in the code supporting both versions at once.
If we've got changes ready to go with all the related projects in [1]
bumping *should* be minimal disruption.

-i


[1] https://bugs.launchpad.net/diskimage-builder/+bug/1718576

[2] https://review.openstack.org/#/c/506524/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Xenial Upgrade Sprint Recap

2017-12-18 Thread Ian Wienand

On 12/19/2017 01:53 AM, James E. Blair wrote:

Ian Wienand <iwien...@redhat.com> writes:


There's a bunch of stuff that wouldn't show up until live, but we
probably could have got a lot of prep work out of the way if the
integration tests were doing something.  I didn't realise that although
we run the tests, most of our modules don't actually have any tests
run ... even something very simple like "apply without failures"


Don't the apply tests do that?


Not really; since they do a --noop run they find things like syntax
issues, dependency loops, missing statements etc; but this does leave a
lot of room for other failures.

For example, our version of puppet-nodejs was warning on Xenial "this
platform not supported, I'll try to use sensible defaults", which
passed through the apply tests -- but wasn't actually working when it
came to really getting nodejs on the system alongside
etherpad/ethercalc.

I also think there was some false sense of security since (now called)
legacy-puppet-beaker-rspec-infra was working ... even though *it* was
a noop too.

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Gate Issues

2017-12-08 Thread Ian Wienand
Hello,

Just to save people reverse-engineering IRC logs...

At ~04:00UTC frickler called out that things had been sitting in the
gate for ~17 hours.

Upon investigation, one of the stuck jobs was a
legacy-tempest-dsvm-neutron-full job
(bba5d98bb7b14b99afb539a75ee86a80) as part of
https://review.openstack.org/475955

Checking the zuul logs, it had sent that to ze04

  2017-12-07 15:06:20,962 DEBUG zuul.Pipeline.openstack.gate: Build > started

However, zuul-executor was not running on ze04.  I believe there were
issues with this host yesterday.  "/etc/init.d/zuul-executor start" and
"service zuul-executor start" reported as OK, but didn't actually
start the daemon.  Rather than debug, I just used
_SYSTEMCTL_SKIP_REDIRECT=1 and that got it going.  We should look into
that, I've noticed similar things with zuul-scheduler too.

At this point, the evidence suggested zuul was waiting for jobs that
would never return.  Thus I saved the queues, restarted zuul-scheduler
and re-queued.

Soon after frickler again noticed that releasenotes jobs were now
failing with "could not import extension openstackdocstheme" [1].  We
suspect [2].

However, the gate did not become healthy.  Upon further investigation,
the executors are very frequently failing jobs with

 2017-12-08 06:41:10,412 ERROR zuul.AnsibleJob: [build: 
11062f1cca144052afb733813cdb16d8] Exception while executing job
 Traceback (most recent call last):
   File "/usr/local/lib/python3.5/dist-packages/zuul/executor/server.py", line 
588, in execute
 str(self.job.unique))
   File "/usr/local/lib/python3.5/dist-packages/zuul/executor/server.py", line 
702, in _execute
   File "/usr/local/lib/python3.5/dist-packages/zuul/executor/server.py", line 
1157, in prepareAnsibleFiles
   File "/usr/local/lib/python3.5/dist-packages/zuul/executor/server.py", line 
500, in make_inventory_dict
 for name in node['name']:
 TypeError: unhashable type: 'list'

This is leading to the very high "retry_limit" failures.

We suspect change [3] as this did some changes in the node area.  I
did not want to revert this via a force-merge, I unfortunately don't
have time to do something like apply manually on the host and babysit
(I did not have time for a short email, so I sent a long one instead :)

At this point, I sent the alert to warn people the gate is unstable,
which is about the latest state.

Good luck,

-i

[1] 
http://logs.openstack.org/95/526595/1/check/build-openstack-releasenotes/f38ccb4/job-output.txt.gz
[2] https://review.openstack.org/525688
[3] https://review.openstack.org/521324

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Caching zanata-cli?

2017-12-03 Thread Ian Wienand
On 12/04/2017 09:54 AM, Andreas Jaeger wrote:
> ERROR: Failure downloading 
> https://search.maven.org/remotecontent?filepath=org/zanata/zanata-cli/3.8.1/zanata-cli-3.8.1-dist.tar.gz,
>  
> HTTP Error 503: Service Unavailable: Back-end server is at capacity
> 
> Could we cache this, please? Any takers?

There are several ways we could do this

 1. Stick it on tarballs.o.o -- which isn't local but may be more reliable
 2. Actually mirror via AFS -- a bit of a pain to setup for one file
 3. cache via reverse proxy -- possible
 4. add to CI images -- easy to do and avoid remote failures.

So I've proposed 4 in [1] and we can discuss further...

-i

[1] https://review.openstack.org/525050

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[openstack-dev] [infra][all] Removal of packages from bindep-fallback

2017-11-15 Thread Ian Wienand
Hello,

Some time ago we started the process of moving towards projects being
more explicit about thier binary dependencies using bindep [1]

To facilitate the transition, we created a "fallback" set of
dependencies [2] which are installed when a project does not specifiy
it's own bindep dependencies.  This essentially replicated the rather
ad-hoc environment provided by CI images before we started the
transition.

This list has acquired a few packages that cause some problems in
various situations today.  Particularly packages that aren't in the
increasing number of distributions we provide, or packages that come
from alternative repositories.

To this end, [3,4] proposes the removal of

 liberasurecode-*
 mongodb-*
 python-zmq
 redis
 zookeeper
 ruby-*

from the fallback packages.  This has a small potential to affect some
jobs that tacitly rely on these packages.

NOTE: this does *not* affect devstack jobs (devstack manages it's own
dependencies outside bindep) and if you want them back, it's just a
matter of putting them into the bindep file in your project (and as a
bonus, you have better dependency descriptions for your code).

We should be able to then remove centos-release-openstack-* from our
centos base images too [5], which will make life easier for projects
such as triple-o who have to work-around that.

If you have concerns, please reach out either via mail or in
#openstack-infra

Thank you,

-i

[1] https://docs.openstack.org/infra/bindep/
[2] 
https://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/data/bindep-fallback.txt
[3] https://review.openstack.org/519533
[4] https://review.openstack.org/519534
[5] https://review.openstack.org/519535

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Building Openstack Trove Images

2017-11-07 Thread Ian Wienand
On 11/07/2017 05:40 PM, Ritesh Vishwakarma wrote:
> as the *dib-lint* file is there instead of the mentioned
> *disk-image-create *and when executed just verifies the other
> elements.

Those instructions unfortunately look out of date for master
diskimage-builder.  I will try to get a minute to parse them and
update later.

You will probably have a lot more success installing diskimage-builder
into a virtualenv; see [1] ... then activate the virtualenv and use
disk-image-create from there.  Likely the rest will work.

If diskimage-builder is the problem, feel free to jump into
#openstack-dib (best during .au hours to catch me) and we can help.

[1] 
https://docs.openstack.org/diskimage-builder/latest/user_guide/installation.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] Sydney Infra evening

2017-11-07 Thread Ian Wienand
Let's meet at the swirlly fountain pit about 6:10pm

Preliminary plan is a ferry, dinner, walk and drinks

Not to sound like your Mum/Mom but a light jacket and comfortable shoes
suggested :)

-i

On 1 Nov. 2017 10:59 am, "Ian Wienand" <iwien...@redhat.com> wrote:

On 10/18/2017 05:37 PM, Ian Wienand wrote:

> Hi all,
>
> As discussed in the meeting, I've started a page for planning an infra
> evening in Sydney (but note -- ALL welcome)
>
>https://ethercalc.openstack.org/lx7zv5denrb9
>

It looks like Wednesday night (8th) and the more active/pub crawl
option for those interested.

Cheers,

-i
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Add member to upstream-institute-virtual-environment-core group

2017-11-01 Thread Ian Wienand
On 11/01/2017 09:27 PM, Ian Y. Choi wrote:
> Could you please add "Mark Korondi" in 
> upstream-institute-virtual-environment-core group?
> He is the bootstrapper of the project: 

It seems Mark has managed to get two gerrit accounts:

| registered_on   | full_name| preferred_email| 
contact_filed_on 
|-+--++--
| 2015-12-16 17:55:29 | Mark Korondi | korondi.m...@gmail.com | 2014-03-06 
21:07:35 |
| 2017-01-07 22:33:55 | Mark Korondi | korondi.m...@gmail.com | NULL
|

I have removed the second one and added to the group (I also added
yourself in case of issues).

Mark -- if you're having issues, reach out in #openstack-infra

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Sydney Infra evening

2017-10-31 Thread Ian Wienand

On 10/18/2017 05:37 PM, Ian Wienand wrote:

Hi all,

As discussed in the meeting, I've started a page for planning an infra
evening in Sydney (but note -- ALL welcome)

   https://ethercalc.openstack.org/lx7zv5denrb9


It looks like Wednesday night (8th) and the more active/pub crawl
option for those interested.

Cheers,

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] Fwd: [Distutils][pbr][devstack][qa] Announcement: Pip 10 is coming, and will move all internal APIs

2017-10-22 Thread Ian Wienand

On 10/22/2017 12:18 AM, Jeremy Stanley wrote:

Right, on Debian/Ubuntu it's not too terrible (cloud-init's
dependencies are usually the biggest issue there and we manage to
avoid them by building our own images with no cloud-init), but on
Red Hat derivatives there are a lot of deep operating system
internals built on top of packaged Python libraries which simply
can't be uninstalled cleanly nor safely.


Also note though, if it can be uninstalled, we have often had problems
with the packages coming back and overwriting the pip installed
version, which leads to often very obscure problems.  For this reason
in various bits of devstack/devstack-gate/dib's pip install etc we
often install and pin packages to let pip overwrite them.

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Fwd: [Distutils][pbr] Announcement: Pip 10 is coming, and will move all internal APIs

2017-10-22 Thread Ian Wienand

On 10/21/2017 07:14 AM, Clark Boylan wrote:

The current issue this change is facing can be seen at
http://logs.openstack.org/25/513825/4/check/legacy-tempest-dsvm-py35/c31deb2/logs/devstacklog.txt.gz#_2017-10-20_20_07_54_838.
The tl;dr is that for distutils installed packages (basically all the
distro installed python packges) pip refuses to uninstall them in order
to perform upgrades because it can't reliably determine where all the
files are. I think this is a new pip 10 behavior.

In the general case I think this means we can not rely on global pip
installs anymore. This may be a good thing to bring up with upstream
PyPA as I expect it will break a lot of people in a lot of places (it
will break infra for example too).


deja-vu!  pip 8 tried this and quickly reverted.  I wrote a long email
with all the details, but then figured that's not going to help much
so translated it into [1].

-i

[1] https://github.com/pypa/pip/issues/4805

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra] Short gerrit / zuul outage 2017-10-20 20:00UTC

2017-10-20 Thread Ian Wienand

On 10/20/2017 03:46 PM, Ian Wienand wrote:

We plan a short outage (<30 minutes) of gerrit and zuul on 2017-10-20
20:00UTC to facilitate project rename requests.


Note this has been postponed to a future (TBD) date

Thanks,

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [infra] Short gerrit / zuul outage 2017-10-20 20:00UTC

2017-10-19 Thread Ian Wienand

Hello,

We plan a short outage (<30 minutes) of gerrit and zuul on 2017-10-20
20:00UTC to facilitate project rename requests.

In flight jobs should be restarted, but if something does go missing a
"recheck" comment will work.

Thanks,

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[OpenStack-Infra] Sydney Infra evening

2017-10-18 Thread Ian Wienand

Hi all,

As discussed in the meeting, I've started a page for planning an infra
evening in Sydney (but note -- ALL welcome)

  https://ethercalc.openstack.org/lx7zv5denrb9

I put an active, less active and easy option.  Just fill it in and
we'll see where we're at.

Cheers,

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Nominating new project-config and zuul job cores

2017-10-17 Thread Ian Wienand

On 10/14/2017 03:25 AM, Clark Boylan wrote:

I'd like to nominate a few people to be core on our job related config
repos. Dmsimard, mnaser, and jlk have been doing some great reviews
particularly around the Zuul v3 transition. In recognition of this work
I propose that we give them even more responsibility and make them all
cores on project-config, openstack-zuul-jobs, and zuul-jobs.

Please chime in with your feedback.


++ nice to see a lively project!

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [all] Zuul v3 Rollout Update - devstack-gate issues edition

2017-10-12 Thread Ian Wienand

On 10/12/2017 04:28 PM, Ian Wienand wrote:

- logs issues

Should be behind us.  The logs partition ran out of inodes, causing
log upload failures.  Pruning jobs should have rectified this.


This time it's true :)  But please think about this with your jobs, and
don't upload hundreds of little files unnecessarily.


- Ubuntu package issues

You may notice a range of issues with Ubuntu packages.  The root cause
is that our mirror is behind due a broken reprepro.


Thanks to the efforts of jeblair and pabelanger, the ubuntu mirror
has been restored.  There should be no more issues relating to out
of date mirrors.


- system-config breakage


resolved


- devstack-gate cache copying


resolved

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] [openstack-dev] [all] Zuul v3 Rollout Update - devstack-gate issues edition

2017-10-12 Thread Ian Wienand
On 10/12/2017 05:52 PM, Ian Wienand wrote:
> I tried this in order, firstly recreating references.db (didn't help)
> and so I have started the checksums.db recreation.  This is now
> running; I just moved the old one out of the way

Well, that didn't go so well.  The output flooded stuff and then it
died.

---
...
Within references.db subtable references at get: No such file or directory
BDB0134 read: 0x11989b0, 4096: No such file or directory
Internal error of the underlying BerkeleyDB database:
Within references.db subtable references at get: No such file or directory
BDB0134 read: 0x11989b0, 4096: No such file or directory
Internal error of the underlying BerkeleyDB database:
Within references.db subtable references at get: No such file or directory
37 files were added but not used.
The next deleteunreferenced call will delete them.
BDB0151 fsync: Connection timed out
BDB0164 close: Connection timed out
./db/checksums.db: Connection timed out
BDB3028 ./db/checksums.db: unable to flush: Connection timed out
db_close(checksums.db, pool): Connection timed out
Error creating './db/version.new': Connection timed out(errno is 110)
Error 110 deleting lock file './db/lockfile': Connection timed out!
There have been errors!
---

Presumably this matches up with the AFS errors logged

---
[Thu Oct 12 09:19:59 2017] afs: Lost contact with file server 104.130.138.161 
in cell openstack.org (code -512) (all multi-homed ip addresses down for the 
server)
[Thu Oct 12 09:19:59 2017] afs: Lost contact with file server 104.130.138.161 
in cell openstack.org (code -512) (all multi-homed ip addresses down for the 
server)
[Thu Oct 12 09:19:59 2017] afs: failed to store file (110)
[Thu Oct 12 09:20:02 2017] afs: failed to store file (110)
[Thu Oct 12 09:20:10 2017] afs: file server 104.130.138.161 in cell 
openstack.org is back up (code 0) (multi-homed address; other same-host 
interfaces may still be down)
[Thu Oct 12 09:20:10 2017] afs: file server 104.130.138.161 in cell 
openstack.org is back up (code 0) (multi-homed address; other same-host 
interfaces may still be down)
---

I restarted for good luck, but if this is transient network issues, I
guess it will just happen again.  ping shows no packet loss, but very
occasional latency spikes, fwiw.

We restarted mirror-update; maybe it's worth restarting the AFS
servers too?

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [all] Zuul v3 Rollout Update - devstack-gate issues edition

2017-10-11 Thread Ian Wienand
There are still significant issues

- logs issues

Should be behind us.  The logs partition ran out of inodes, causing
log upload failures.  Pruning jobs should have rectified this.

- Ubuntu package issues

You may notice a range of issues with Ubuntu packages.  The root cause
is that our mirror is behind due a broken reprepro.  Unfortunately, we
build our daily images against an external upstream mirror, so they
have been built using later packages than our un-updated region
mirrors provide, leading apt to great confusion.  Some debugging notes
on reprepro at [1], but I have to conclude the .db files are corrupt
and I have no idea how to recreate these other than to start again.

I think the most expedient solution here will be to turn /ubuntu on
mirrors into a caching reverse proxy for upstream.  However;

- system-config breakage

The system-config gate is broken due to an old pip pin with [2].
However, despite this merging several hours ago, zuulv2 doesn't seem
to want to reload to pick this up.  I have a suspicion that because it
was merged by zuulv3 maybe zuulv2 missed it?  I'm not sure, and don't
think even turning the jobs -nv will help.

- devstack-gate cache copying

This means the original devstack-gate cache issues [3] remain unmerged
at this point.

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-12.log.html#t2017-10-12T04:04:16
[2] https://review.openstack.org/511360
[3] https://review.openstack.org/511260

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [DIB] DIB Meetings

2017-10-05 Thread Ian Wienand

On 10/06/2017 02:19 AM, Andreas Scheuring wrote:

seems like there is some confusing information about the DIB
meetings in the wiki [1]. The meeting is alternating between 15:00
and 20:00 UTC.  But whenever the Text says 15:00 UTC, the link
points to a 20:00 UTC worldclock site and vice versa.



What is the correct meeting time? At least today 15:00 UTC no one
was there...


Sorry about that, the idea was to alternate every 2 weeks between an
EU time and a APAC/USA time.  But as you noted I pasted everything in
backwards causing great confusion :) Thanks to tonyb we're fixed up
now.


I put an item on the agenda for today's meeting but can't make 20:00
UTC today. It would be great if you could briefly discuss it and
provide feedback on the patch (it's about adding s390x support to
DIB). I'm also open for any offline discussions.


Sorry, with all going on this fell down a bit.  I'll comment there

Thanks,

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [devstack] zuulv3 gate status; LIBS_FROM_GIT failures

2017-09-28 Thread Ian Wienand

On 09/29/2017 03:37 PM, Ian Wienand wrote:

I'm not aware of issues other than these at this time


Actually, that is not true.  legacy-grenade-dsvm-neutron-multinode is
also failing for unknown reasons.  Any debugging would be helpful,
thanks.

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [devstack] zuulv3 gate status; LIBS_FROM_GIT failures

2017-09-28 Thread Ian Wienand

Hi,

There's a few issues with devstack and the new zuulv3 environment

LIBS_FROM_GIT is broken due to the new repos not having a remote
setup, meaning "pip freeze" doesn't give us useful output.  [1] just
disables the test as a quick fix for this; [2] is a possible real fix
but should be tried a bit more carefully in case there's corners I
missed.  This will be affecting other projects.

However, before we can get this in, we need to fix the gate.  The
"updown" tests have missed a couple of requirement projects due to
them setting flags that were not detected during migration.  [3] is a
fix for that and seems to work.

For some reason, the legacy-tempest-dsvm-nnet job is running against
master, and failing as nova-net is deprecated there.  I'm clutching at
straws to understand this one, as it seems like the branch filters are
setup correctly; [4] is one guess?

I'm not aware of issues other than these at this time

-i

[1] https://review.openstack.org/508344
[2] https://review.openstack.org/508366
[3] https://review.openstack.org/508396
[4] https://review.openstack.org/508405

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[OpenStack-Infra] [incident] OVH-BHS1 mirror disappeared

2017-09-20 Thread Ian Wienand

At around Sep 21 02:30UTC mirror01.bhs1.ovh.openstack.org became
uncontactable and jobs in the region started to fail.

The server was in an ACTIVE state but uncontactable.  I attempted to
get a console but either a log or url request returned 500 (request
id's below if it helps).

 ... console url show ...
The server has either erred or is incapable of performing the requested 
operation. (HTTP 500) (Request-ID: req-5da4cba2-efe8-4dfb-a8a7-faf490075c89)
 ...  console log show ...
The server has either erred or is incapable of performing the requested 
operation. (HTTP 500) (Request-ID: req-80beb593-b565-42eb-8a97-b2a208e3d865)

I could not figure out how to log into the web console with our
credentials.

I attempted to hard-reboot it, and it currently appears stuck in
HARD_REBOOT.  Thus I have placed nodepool.o.o in the emergency file
and set max-servers for the ovh-bhs1 region to 0

I have left it at this, as hopefully it will be beneficial for both
OVH and us to diagnose the issue since the host was definitely not
expected to disappear.  After this we can restore or rebuild it as
required.

Thanks,

-i

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-19 Thread Ian Wienand

On 09/20/2017 09:30 AM, David Moreau Simard wrote:

At what point does it become beneficial to build more than one image per OS
that is more aggressively tuned/optimized for a particular purpose ?


... and we can put -dsvm- in the jobs names to indicate it should run
on these nodes :)

Older hands than myself will remember even more issues, but the
"thicker" the base-image has been has traditionally just lead to a lot
more corners for corner-cases can hide in.  We saw this all the time
with "snapshot" images where we'd be based on upstream images that
would change ever so slightly and break things, leading to
diskimage-builder and the -minimal build approach.

That said, in a zuulv3 world where we are not caching all git and have
considerably smaller images, a nodepool that has a scheduler that
accounts for flavor sizes and could conceivably understand similar for
images, and where we're building with discrete elements that could
"bolt-on" things like a list-of-packages install sanely to daily
builds ... it's not impossible to imagine.

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [devstack] Why do we apt-get install NEW files/debs/general at job time ?

2017-09-19 Thread Ian Wienand

On 09/19/2017 11:03 PM, Jeremy Stanley wrote:

On 2017-09-19 14:15:53 +0200 (+0200), Attila Fazekas wrote:
[...]

The jobs does 120..220 sec apt-get install and packages defined
/files/debs/general are missing from the images before starting the job.



Is the time spent at this stage mostly while downloading package
files (which is what that used to alleviate) or is it more while
retrieving indices or installing the downloaded packages (things
having them pre-retrieved on the images never solved anyway)?


As you're both aware, but others may not be, at the end of the logs
devstack does keep a timing overview that looks something like

=
DevStack Component Timing
=
Total runtime1352

run_process   15
test_with_retry4
apt-get-update 2
pip_install  270
osc  365
wait_for_service  29
dbsync23
apt-get  137
=

That doesn't break things down into download v install, but apt does
have download summary that can be grepped for

---
$ cat devstacklog.txt.gz | grep Fetched
2017-09-19 17:52:45.808 | Fetched 39.3 MB in 1s (26.3 MB/s)
2017-09-19 17:53:41.115 | Fetched 185 kB in 0s (3,222 kB/s)
2017-09-19 17:54:16.365 | Fetched 23.5 MB in 1s (21.1 MB/s)
2017-09-19 17:54:25.779 | Fetched 18.3 MB in 0s (35.6 MB/s)
2017-09-19 17:54:39.439 | Fetched 59.1 kB in 0s (0 B/s)
2017-09-19 17:54:40.986 | Fetched 2,128 kB in 0s (40.0 MB/s)
2017-09-19 17:57:37.190 | Fetched 333 kB in 0s (1,679 kB/s)
2017-09-19 17:58:17.592 | Fetched 50.5 MB in 2s (18.1 MB/s)
2017-09-19 17:58:26.947 | Fetched 5,829 kB in 0s (15.5 MB/s)
2017-09-19 17:58:49.571 | Fetched 5,065 kB in 1s (3,719 kB/s)
2017-09-19 17:59:25.438 | Fetched 9,758 kB in 0s (44.5 MB/s)
2017-09-19 18:00:14.373 | Fetched 77.5 kB in 0s (286 kB/s)
---

As mentioned, we setup the package manager to point to a region-local
mirror during node bringup.  Depending on the i/o situation, it is
probably just as fast as coming off disk :) Note (also as mentioned)
these were never pre-installed, just pre-downloaded to an on-disk
cache area (as an aside, I don't think dnf was ever really happy with
that situation and kept being too smart and clearing it's caches).

If you're feeling regexy you could maybe do something similar with the
pip "Collecting" bits in the logs ... one idea for investigation down
that path is if we could save time by somehow collecting larger
batches of requirements and doing less pip invocations?

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [diskimage-builder] Does anyone use "fedora" target?

2017-08-29 Thread Ian Wienand

Hi,

The "fedora" element -- the one that downloads the upstream .qcow2 and
re-packages it -- is currently broken as the links we use have
disappeared [1].  Even allowing for this, it's still broken with some
changes to the kernel install scripts [2].  AFAICT, the only thing
noticing this is our CI.

fedora-minimal takes a different approach of building the system
within a blank chroot.  It's what we use to create the upstream
images.

I believe the octavia jobs switched to fedora-minimal?

Is there anyone still using these image-based jobs?  Is there any
reason why you can't use fedora-minimal?  I don't really see this as
being that useful, and our best path forward might be just to retire
it.

Thanks,

-i

[1] https://review.openstack.org/497734
[2] https://bugs.launchpad.net/diskimage-builder/+bug/1713381

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all] Python 3.6 testing is available on Fedora 26 nodes

2017-08-24 Thread Ian Wienand
Hello,

In a recent discussion [1] I mentioned we could, in theory, use Fedora
26 for Python 3.6 testing (3.6.2, to be exact).  After a few offline
queries we have put theory into practice, sorted out remaining issues
and things are working.

For unit testing (tox), you can use the
'gate-{name}-python36-{node}-nv' job template with fedora-26 nodes.
For an example, see [2] (which, I'm happy to report, found a real
issue [3] :).  You may need to modify your bindep.txt files to install
correct build pacakges for RPM platforms; in terms of general
portability this is probably not a bad thing anyway.

I have an up/down devstack test working with a minor change [4].  I
will work on getting this more stable and more complete, but if this
is of interest, reach out.  In general, I track the centos & fedora
jobs fairly closely at [5] to try and keep up with any systemic
issues.

Although it is not exactly trivial, there is fairly complete
instructions within [6] to help build a Fedora image that looks like
the infra ones for testing purposes.  You can also reach out and we
can do things like place failed nodes on hold if there are hard to
debug issues.

Thanks,

-i

[1] http://lists.openstack.org/pipermail/openstack-dev/2017-August/120888.html
[2] 
https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=5fe3ba95616136709a319ae1cd3beda38a299a13
[3] https://review.openstack.org/496054
[4] https://review.openstack.org/496098
[5] http://people.redhat.com/~iwienand/devstack-status/
[6] 
https://git.openstack.org/cgit/openstack-infra/project-config/tree/tools/build-image.sh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][infra]Plan to support Python 3.6 ?

2017-08-10 Thread Ian Wienand

On 08/09/2017 11:25 PM, ChangBo Guo wrote:

We received Python 3.6 related Bug recently [1][2]. That let me think
what's the plan to support Python 3.6 for OpenStack in the future.   Python
3.6 was released on December 23, 2016, has some different behaviors from
Python 3.5[3]. talked with cdent in the IRC, would like to discuss this
through mailing list , and suggest a discussion at the PTG[3]

1. what's the time to support Python 3.6 ?

2. what 's the  plan or process ?


If you really want to live on the edge, Fedora 26 CI nodes are
available and include Python 3.6.  I'll be happy to help if you're not
familiar with using different nodes for jobs, or with any issues.

I've had devstack going in experimental successfully.  I probably
wouldn't recommend throwing it in as a voting gate job straight away,
but everything should be there :)

-i


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[OpenStack-Infra] citycloud lon1 mirror postmortem

2017-08-10 Thread Ian Wienand

Hi,

In response to sdague reporting that citycloud jobs were timing out, I
investigated the mirror, suspecting it was not providing data fast enough.

There were some 170 htcacheclean jobs running, and the host had a load
over 100.  I killed all these, but performance was still unacceptable.

I suspected networking, but since the host was in such a bad state I
decided to reboot it.  Unfortunately it would get an address from DHCP
but seemed to have DNS issues ... eventually it would ping but nothing
else was working.

nodepool.o.o was placed in the emergency file and I removed lon1 to
avoid jobs going there.

I used the citycloud live chat, and Kim helpfully investigated and
ended up migrating mirror.lon1.citycloud.openstack.org to a new
compute node.  This appeared to fix things, for us at least.

nodepool.o.o is removed from the emergency file and original config
restored.

With hindsight, clearly the excessive htcacheclean processes were due
to negative feedback of slow processes due to the network/dns issues
all starting to bunch up over time.  However, I still think we could
minimise further issues running it under a lock [1].  Other than that,
not sure there is much else we can do, I think this was largely an
upstream issue.

Cheers,

-i

[1] https://review.openstack.org/#/c/492481/

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [openstack-dev] [heat][infra] Help needed! high gate failure rate

2017-08-10 Thread Ian Wienand
On 08/10/2017 06:18 PM, Rico Lin wrote:
> We're facing a high failure rate in Heat's gates [1], four of our gate
> suffering with fail rate from 6 to near 20% in 14 days. which makes most of
> our patch stuck with the gate.

There have been a confluence of things causing some problems recently.
The loss of OSIC has distributed more load over everything else, and
we have seen an increase in job timeouts and intermittent networking
issues (especially if you're downloading large things from remote
sites).  There have also been some issues with the mirror in rax-ord
[1]

> gate-heat-dsvm-functional-convg-mysql-lbaasv2-ubuntu-xenial(19.67%)
> gate-heat-dsvm-functional-convg-mysql-lbaasv2-non-apache-ubuntu-xenia(9.09%)
> gate-heat-dsvm-functional-orig-mysql-lbaasv2-ubuntu-xenial(8.47%)
> gate-heat-dsvm-functional-convg-mysql-lbaasv2-py35-ubuntu-xenial(6.00%)

> We still try to find out what's the cause but (IMO,) seems it might be some
> thing wrong with our infra. We need some help from infra team, to know if
> any clue on this failure rate?

The reality is you're just going to have to triage this and be a *lot*
more specific with issues.  I find opening an etherpad and going
through the failures one-by-one helpful (e.g. I keep [2] for centos
jobs I'm interested in).

Looking at the top of the console.html log you'll have the host and
provider/region stamped in there.  If it's timeouts or network issues,
reporting to infra the time, provider and region of failing jobs will
help.  If it's network issues similar will help.  Finding patterns is
the first step to understanding what needs fixing.

If it's due to issues with remote transfers, we can look at either
adding specific things to mirrors (containers, images, packages are
all things we've added recently) or adding a caching reverse-proxy for
them ([3],[4] some examples).

Questions in #openstack-infra will usually get a helpful response too

Good luck :)

-i

[1] https://bugs.launchpad.net/openstack-gate/+bug/1708707/
[2] https://etherpad.openstack.org/p/centos7-dsvm-triage
[3] https://review.openstack.org/491800
[4] https://review.openstack.org/491466

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][devstack] DIB builds after mysql.qcow2 removal

2017-07-17 Thread Ian Wienand

On 07/18/2017 10:01 AM, Tony Breeds wrote:

It wasn't forgotten as suchi, there are jobs still using it/them.  If
keeping the branches around cuases bigger probelsm then EOLing them is
fine.  I'll try to generate a list of the affected projects/jobs and
turn them off.


Thanks; yeah this was pointed out to me later.

I think any jobs can use the -eol tag, rather than the
branch if required (yes, maybe easier said than done depending on how
many layers of magic there are :).  There doesn't seem to be much
point in branches we can't commit to due to broken CI, and I doubt
anyone is keen to maintain it.

-i

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [infra][devstack] DIB builds after mysql.qcow2 removal

2017-07-17 Thread Ian Wienand
Hi,

The removal of the mysql.qcow2 image [1] had a flow-on effect noticed
first by Paul in [2] that the tools/image_list.sh "sanity" check was
not updated, leading to DIB builds failing in a most unhelpful way as
it tries to cache the images for CI builds.

So while [2] fixes the problem; one complication here is that the
caching script [3] loops through the open devstack branches and tries
to collect the images to cache.

Now it seems we hadn't closed the liberty or mitaka branches.  This
causes a problem, because the old branches refer to the old image, but
we can't actually commit a fix to change them because the branch is
broken (such as [4]).

I have taken the liberty of EOL-ing stable/liberty and stable/mitaka
for devstack.  I get the feeling it was just forgotten at the time.
Comments in [4] support this theory.  I have also taken the liberty of
approving backports of the fix to newton and ocata branches [5],[6].

A few 3rd-party CI people using dib have noticed this failure.  As the
trio of [4],[5],[6] move through, your builds should start working
again.

Thanks,

-i

[1] https://review.openstack.org/482600
[2] https://review.openstack.org/484001
[3] 
http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/elements/cache-devstack/extra-data.d/55-cache-devstack-repos
[4] https://review.openstack.org/482604
[5] https://review.openstack.org/484299
[6] https://review.openstack.org/484298

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[OpenStack-Infra] [infra][nova] Corrupt nova-specs repo

2017-06-30 Thread Ian Wienand
Hi,

Unfortunately it seems the nova-specs repo has undergone some
corruption, currently manifesting itself in an inability to be pushed
to github for replication.

Upon examination, it seems there's a problem with a symlink and
probably jgit messing things up making duplicate files.  I have filed
a gerrit bug at [1] (although it's probably jgit, but it's just a
start).

Anyway, that leaves us the problem of cleaning up the repo into a
pushable state.  Here's my suggestion after some investigation:

The following are corrupt

---
$ git fsck
Checking object directories: 100% (256/256), done.
error in tree a494151b3c661dd9b6edc7b31764a2e2995bd60c: contains duplicate file 
entries
error in tree 26057d370ac90bc01c1cfa56be8bd381618e2b3e: contains duplicate file 
entries
error in tree 57423f5165f0f1f939e2ce141659234cbb5dbd4e: contains duplicate file 
entries
error in tree 05fd99ef56cd24c403424ac8d8183fea33399970: contains duplicate file 
entries
---

After some detective work [2], I related all these bad objects to the
refs that hold them.  It look as follows

---
fsck-bad: a494151b3c661dd9b6edc7b31764a2e2995bd60c
 -> 5fa34732b45f4afff3950253c74d7df11b0a4a36 refs/changes/26/463526/9

fsck-bad: 26057d370ac90bc01c1cfa56be8bd381618e2b3e
 -> 47128a23c2aad12761aa0df5742206806c1dfbb8 refs/changes/26/463526/8
 -> 7cf8302eb30b722a00b4d7e08b49e9b1cd5aacf4 refs/changes/26/463526/7
 -> 818dc055b971cd2b78260fd17d0b90652fb276fb refs/changes/26/463526/6

fsck-bad: 57423f5165f0f1f939e2ce141659234cbb5dbd4e

 -> 25bd72248682b584fb88cc01061e60a5a620463f refs/changes/26/463526/3
 -> c7e385eaa4f45b92e9e51dd2c49e799ab182ac2c refs/changes/26/463526/4
 -> 4b8870bbeda2320564d1a66580ba6e44fbd9a4a2 refs/changes/26/463526/5

fsck-bad: 05fd99ef56cd24c403424ac8d8183fea33399970
 -> e8161966418dc820a4499460b664d87864c4ce24 refs/changes/26/463526/2
---

So you may notice this is refs/changes/26/463526/[2-9]

Just deleting these refs and expiring the objects might be the easiest
way to go here, and seems to get things purged and fix up fsck

---
$ for i in `seq 2 9`; do
>  git update-ref -d refs/changes/26/463526/$i
> done

$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 44756, done.
Delta compression using up to 16 threads.
Compressing objects: 100% (43850/43850), done.
Writing objects: 100% (44756/44756), done.
Total 44756 (delta 31885), reused 12846 (delta 0)

$ git fsck
Checking object directories: 100% (256/256), done.
Checking objects: 100% (44756/44756), done.
---

I'm thinking if we then force push that to github, we're pretty much
OK ... a few intermediate reviews will be gone but I don't think
they're important in this context.

I had a quick play with "git ls-tree", edit the file, "git mktree",
"git replace" and then trying to use filter-branch, but couldn't get
it to work.  Suggestions welcome; you can play with the repo from [1]
I would say.

Thanks,

-i

[1] https://bugs.chromium.org/p/gerrit/issues/detail?id=6622
[2] "git log --all --format=raw --raw -t --no-abbrev" and search for
the change sha, then find it in "git show-refs"

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[openstack-dev] [infra][nova] Corrupt nova-specs repo

2017-06-30 Thread Ian Wienand
Hi,

Unfortunately it seems the nova-specs repo has undergone some
corruption, currently manifesting itself in an inability to be pushed
to github for replication.

Upon examination, it seems there's a problem with a symlink and
probably jgit messing things up making duplicate files.  I have filed
a gerrit bug at [1] (although it's probably jgit, but it's just a
start).

Anyway, that leaves us the problem of cleaning up the repo into a
pushable state.  Here's my suggestion after some investigation:

The following are corrupt

---
$ git fsck
Checking object directories: 100% (256/256), done.
error in tree a494151b3c661dd9b6edc7b31764a2e2995bd60c: contains duplicate file 
entries
error in tree 26057d370ac90bc01c1cfa56be8bd381618e2b3e: contains duplicate file 
entries
error in tree 57423f5165f0f1f939e2ce141659234cbb5dbd4e: contains duplicate file 
entries
error in tree 05fd99ef56cd24c403424ac8d8183fea33399970: contains duplicate file 
entries
---

After some detective work [2], I related all these bad objects to the
refs that hold them.  It look as follows

---
fsck-bad: a494151b3c661dd9b6edc7b31764a2e2995bd60c
 -> 5fa34732b45f4afff3950253c74d7df11b0a4a36 refs/changes/26/463526/9

fsck-bad: 26057d370ac90bc01c1cfa56be8bd381618e2b3e
 -> 47128a23c2aad12761aa0df5742206806c1dfbb8 refs/changes/26/463526/8
 -> 7cf8302eb30b722a00b4d7e08b49e9b1cd5aacf4 refs/changes/26/463526/7
 -> 818dc055b971cd2b78260fd17d0b90652fb276fb refs/changes/26/463526/6

fsck-bad: 57423f5165f0f1f939e2ce141659234cbb5dbd4e

 -> 25bd72248682b584fb88cc01061e60a5a620463f refs/changes/26/463526/3
 -> c7e385eaa4f45b92e9e51dd2c49e799ab182ac2c refs/changes/26/463526/4
 -> 4b8870bbeda2320564d1a66580ba6e44fbd9a4a2 refs/changes/26/463526/5

fsck-bad: 05fd99ef56cd24c403424ac8d8183fea33399970
 -> e8161966418dc820a4499460b664d87864c4ce24 refs/changes/26/463526/2
---

So you may notice this is refs/changes/26/463526/[2-9]

Just deleting these refs and expiring the objects might be the easiest
way to go here, and seems to get things purged and fix up fsck

---
$ for i in `seq 2 9`; do
>  git update-ref -d refs/changes/26/463526/$i
> done

$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 44756, done.
Delta compression using up to 16 threads.
Compressing objects: 100% (43850/43850), done.
Writing objects: 100% (44756/44756), done.
Total 44756 (delta 31885), reused 12846 (delta 0)

$ git fsck
Checking object directories: 100% (256/256), done.
Checking objects: 100% (44756/44756), done.
---

I'm thinking if we then force push that to github, we're pretty much
OK ... a few intermediate reviews will be gone but I don't think
they're important in this context.

I had a quick play with "git ls-tree", edit the file, "git mktree",
"git replace" and then trying to use filter-branch, but couldn't get
it to work.  Suggestions welcome; you can play with the repo from [1]
I would say.

Thanks,

-i

[1] https://bugs.chromium.org/p/gerrit/issues/detail?id=6622
[2] "git log --all --format=raw --raw -t --no-abbrev" and search for
the change sha, then find it in "git show-refs"

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] diskimage builder works for trusty but not for xenial

2017-06-21 Thread Ian Wienand

On 06/21/2017 04:44 PM, Ignazio Cassano wrote:

* Connection #0 to host cloud-images.ubuntu.com left intact
Downloaded and cached
http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-root.tar.gz,
having forced upstream caches to revalidate
xenial-server-cloudimg-amd64-root.tar.gz: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match



Are there any problems on http://cloud-images.ubuntu.com ?


There was [1] which is apparently fixed.

As Paul mentioned, the -minimal builds take a different approach and
build the image from debootstrap, rather than modifying the upstream
image.  They are generally well tested just as a side-effect of infra
relying on them daily.  You can use DIB_DISTRIBUTION_MIRROR to set
that to a local mirror and eliminate another source of instability
(however, that leaves the mirror in the final image ... a known issue.
Contributions welcome :)

-i

[1] https://bugs.launchpad.net/cloud-images/+bug/1699396


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   >