from:"Sean Dague"

Re: [OpenStack-Infra] Zuul v3: proposed new Depends-On syntax

2017-05-25 Thread Sean Dague

On 05/24/2017 07:04 PM, James E. Blair wrote:

> The natural way to identify a GitHub pull request is with its URL.
> 
> This can be used to identify Gerrit changes as well, and will likely be
> well supported by other systems.  Therefore, I propose we support URLs
> as the content of the Depends-On footers for all systems.  E.g.:
> 
>   Depends-On: https://review.openstack.org/12345
>   Depends-On: https://github.com/ansible/ansible/pull/12345
> 
> Similarly to the Gerrit change IDs, these identifiers are easily
> navigable within Gerrit (and Gertty), so that reviewers can traverse the
> dependency chain easily.

Sounds sensible to me. The only thing I ask is that we get a good clock
countdown on when it will be removed. Upgrade testing is one of the
places where the multi branch magic was really useful, so it will take a
little while to get good at it.

For gerrit reviews it should also accept -
https://review.openstack.org/#/c/467243/ (as that's what is in people's
browser url bar).

And while this change is taking place, it would be nice if there was the
ability to have words after the url. I've often wanted:

Depends-On: https://review.openstack.org/12345 - nova
Depends-On: https://review.openstack.org/12346 - python-neutronclient

Just as a quick way to remember, without having to link follow, which of
multiple depends on are which projects. I've resorted to putting them on
top, but for short things would love to have them on the same line.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Connectivity issues in the gate

2016-09-28 Thread Sean Dague

On 09/28/2016 03:16 AM, Thomas Herve wrote:
> Hi all,
> 
> I believe since yesterday we started seeing really bad external
> connectivity issues in Heat function tests.
> 
> http://logs.openstack.org/52/359852/4/gate/gate-heat-dsvm-functional-convg-mysql-lbaasv2/7f45043/console.html
> is one of the numerous example. We try to fetch a fedora image from
> one of the mirror, and it fails.
> 
> It seems to be mostly (only?) on OSIC nodes. While in the past it's
> been flaky, now we're getting 50%+ failure rate. And it's not just
> slow, but failing to get any bytes out. It works sometimes though.
> 
> Let me know if we can help debugging that.
> 
> Thanks,

For most tests in the gate we try to isolate from having to download
content from external resources entirely. That's why there are per cloud
package mirrors, per cloud pip mirrors.

Break downs of wide internet connectivity is very hard to diagnose, and
just as often the fault of the upstream source as the downstream
network. My suggestion is to work with the infra team to mirror whatever
content you need within their existing mirror structures.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] A tool for slurping gerrit changes in to bug updates

2016-05-27 Thread Sean Dague

On 05/26/2016 03:07 PM, Jeremy Stanley wrote:
> On 2016-05-26 14:23:38 -0400 (-0400), Sean Dague wrote:
> [...]
>> It does run on a custom port... so the great firewall of China is still
>> probably an issue.
> [...]
> 
> I had assumed websockets could be a solution there?
> http://jpmens.net/2014/07/03/the-mosquitto-mqtt-broker-gets-websockets-support/

Oh, good point. Also, you could just bind to 80. You can bind to as many
ports as you want with it.

>> In my ideal world of awesomeness, there would be an MQTT server in infra
>> which was getting data from all the relevant change sources
> [...]
> 
> This makes a good case for it running on a separate server then
> rather than directly on the Gerrit server.

Right, plus any load on this server would not impact the gerrit server.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] A tool for slurping gerrit changes in to bug updates

2016-05-26 Thread Sean Dague

On 05/26/2016 02:11 PM, Matthew Treinish wrote:
> On Thu, May 26, 2016 at 05:55:34PM +, Jeremy Stanley wrote:
>> On 2016-05-26 12:54:49 -0400 (-0400), Matthew Treinish wrote:
>>> Just a quick follow-up I started running this on a throwaway server at
>>> 15.184.138.236. So you can subscribe to events from gerrit.
>> [...]
>>
>> How resource-intensive is it? Curious whether it makes sense to run
>> something like this directly on review.openstack.org. If zuul grew
>> support for that mechanism, it might allow CI systems (third party
>> or even our own) to wean off using SSH entirely since this is a
>> problem in a lot of places from crazy enterprise firewall policies
>> to systems running in mainland China.

It does run on a custom port... so the great firewall of China is still
probably an issue.

> It's eating like nothing on my server right now. This is all running on a 
> single
> cpu vm on a private cloud with a "Intel Xeon E312xx (Sandy Bridge)" (according
> to /proc/cpuinfo) Mosquitto itself has a memory footprint < 1MB and I've seen 
> it
> spike up to a whopping 1% cpu utilization. Although, this might increase a as 
> more
> subscribers are added. This is the first time I've played with mosquitto and 
> mqtt
> so I don't know what it's scaling is like. But, I imagine it should handle a 
> lot
> of subscriptions well since it's supposed to be an IoT thing. germqtt is 
> eating
> a bit more with consuming about 1.5M of RAM and it uses about the same CPU as
> Mosquitto.

In my ideal world of awesomeness, there would be an MQTT server in infra
which was getting data from all the relevant change sources (gerrit
changes, launchpad changes (which, yes requires something crazy like
converting an email stream into events)), zuul enqueue, dequeue events).

And then every time someone wanted to build some ad hoc web tool to take
a slice of this they could consume the event stream for updates, instead
of doing what everyone does and polls once an hour. As long as the topic
trees are well structured, it should make for an easy way to get the
slice they needed and only react to that.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] A tool for slurping gerrit changes in to bug updates

2016-05-25 Thread Sean Dague

On 05/25/2016 01:54 PM, Spencer Krum wrote:
> 
> When working on a previous project, Nikki wrote this tool as a general
> purpose hook into gerrit:
> 
> https://github.com/notnownikki/zoidberg
> 
> I don't think she is actively maintaining or using it right now though.

One thing I've been thinking a bit about is whether the event stream
could get into something like MQTT easily. In completely unrelated
activities (https://home-assistant.io/) I've been playing with mosquitto
(http://mosquitto.org/) quite a bit, and the ease of consumption of mqtt
is quite nice. (You can even do it straight in javascript for web based
things).

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] right way to report odd infrastructure issues?

2016-02-12 Thread Sean Dague

On 02/12/2016 12:03 PM, Jeremy Stanley wrote:
> On 2016-02-12 06:41:56 -0500 (-0500), Sean Dague wrote:
> [...]
>> What bug / issue tracker should I be using so that we can build up
>> profiles of things like this? I know story board isn't a thing atm. The
>> openstack-gate project in launchpad I don't think gets looked at (we
>> mostly use it as a dumping ground for ER signatures that don't have a
>> clear home).
> [...]
> 
> Given the degree to which the Infra team is currently overloaded,
> sticking "something happened but I don't know what, here's a log"
> into an issue tracker is an even more effective black hole than
> asking someone in IRC to try and help figure out what happened. Sad,
> but true. The more effective path, which I'll grant is also not
> easy, is to try to add relevant additional debugging so that you can
> spot the cause yourself the next time it happens and have some hope
> of devising a fix for it thereafter.

For most of these issues infra-root is needed. So self debug isn't
reasonable.

If the answer to how to address issues is only "go bug people in IRC
until you get attention", I don't think we're ever going to dig out of
the current overload. My hope was if there was some more registered way
to handle issues then prioritization / triage could happen.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] right way to report odd infrastructure issues?

2016-02-12 Thread Sean Dague

I hit another oddball infrastructure issue this morning where setup
workspace took > 1 hr (giving 34 minutes for tempest job which then
failed with a timeout) -
http://logs.openstack.org/27/279227/2/check/gate-tempest-dsvm-full-ceph/35c90b3/console.html

The current model is mostly to take these to IRC, but things get lost
there, especially as I tend to find many of these during light coverage
windows there.

What bug / issue tracker should I be using so that we can build up
profiles of things like this? I know story board isn't a thing atm. The
openstack-gate project in launchpad I don't think gets looked at (we
mostly use it as a dumping ground for ER signatures that don't have a
clear home).

Would love to have a better model for writing down what we see that can
get looked at when folks are around to poke at the issues.

    -Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] logstash URL queries.

2015-10-16 Thread Sean Dague

On 10/16/2015 07:45 AM, Tony Breeds wrote:
> Hi Everybody,
> So it looks to me like we recently updated logstash.openstack.org to a
> newer version of kibabna?
> 
> I'll very openly admit that my logstash fu is wanting but I'm having a little
> trouble
> 
> 1) I can't work out how to after performing my query/filtering share the
>results (for example in a LP bug).  I tried using the sharable link but it
>always comes up with:
>http://logstash.openstack.org/#dashboard/undefined/undefined
>and that URL doesn't work by magic :(
> 
>How do we share queries now?
> 
> 2) It looks to me like all our old saved URLs are now invalid, this is a big 
> of
>a problem for open bugs with logstash URLs in them.
> 
>Is there anyway we can make the old URLs work again?
> 
> 3) I'm sure that the following is a valid query:
>message:"ValueError: git history requires a target version of 
> pbr.version.SemanticVersion(5.0.1), but target version is 
> pbr.version.SemanticVersion(5.0.0)" AND tags:"console" AND 
> project:openstack/requirements
>but it just spins and never seems to return any matches.  If instead I set
>the query to :
>message:"ValueError: git history requires a target version of 
> pbr.version.SemanticVersion(5.0.1), but target version is 
> pbr.version.SemanticVersion(5.0.0)"
>and use the UI to add filters for tags:"console" and 
> project:openstack/requirements

It turns out new kibana is more strict on quotes.
project:"openstack/requirements" with your query works correctly.

> 
>I do get some results.
> 
>Is there a problem with my query or the backend?
> 
> Just to be clear, I don't mean this email to sound like I'm complaining, I'm
> just confused and looking for help.
> 
> 
> Yours Tony.
> 
> 
> 
> ___
> OpenStack-Infra mailing list
> OpenStack-Infra@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> 


-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] logstash URL queries.

2015-10-16 Thread Sean Dague

On 10/16/2015 10:24 AM, Matthew Treinish wrote:
> On Fri, Oct 16, 2015 at 08:51:21AM -0400, Sean Dague wrote:
>> On 10/16/2015 07:45 AM, Tony Breeds wrote:
>>> Hi Everybody,
>>> So it looks to me like we recently updated logstash.openstack.org to a
>>> newer version of kibabna?
>>>
>>> I'll very openly admit that my logstash fu is wanting but I'm having a 
>>> little
>>> trouble
>>>
>>> 1) I can't work out how to after performing my query/filtering share the
>>>results (for example in a LP bug).  I tried using the sharable link but 
>>> it
>>>always comes up with:
>>>http://logstash.openstack.org/#dashboard/undefined/undefined
>>>and that URL doesn't work by magic :(
> 
> That's better than what I get, when I click it I get an empty bar.
> 
>>>
>>>How do we share queries now?
>>
>> That's a really good question, without sharable urls we're sort of dead
>> in the water.
> 
> Yeah, we need to make sure this works.
> 
>>
>>> 2) It looks to me like all our old saved URLs are now invalid, this is a 
>>> big of
>>>a problem for open bugs with logstash URLs in them.
>>>
>>>Is there anyway we can make the old URLs work again?
>>>
>>> 3) I'm sure that the following is a valid query:
>>>message:"ValueError: git history requires a target version of 
>>> pbr.version.SemanticVersion(5.0.1), but target version is 
>>> pbr.version.SemanticVersion(5.0.0)" AND tags:"console" AND 
>>> project:openstack/requirements
>>>but it just spins and never seems to return any matches.  If instead I 
>>> set
>>>the query to :
>>>message:"ValueError: git history requires a target version of 
>>> pbr.version.SemanticVersion(5.0.1), but target version is 
>>> pbr.version.SemanticVersion(5.0.0)"
>>>and use the UI to add filters for tags:"console" and 
>>> project:openstack/requirements
>>>
>>>I do get some results.
>>>
>>>Is there a problem with my query or the backend?
>>>
>>> Just to be clear, I don't mean this email to sound like I'm complaining, I'm
>>> just confused and looking for help.
>>
>> I noticed that we've appeared to deploy v3.1.2, which was released Nov
>> 2014. Is there a reason we're not on 4.1.2 which is a more recent version?
> 
> We need to upgrade elastic search to be able to use the latest kibana. The
> upgrade to kibana 3 was a prereq for upgrading elastic search. Once we have
> a newer version of ES then we can look at kibana 4.

Hmmm... ok, well the current kibana without working sharable urls is
kind of a show stopper for using the tooling for bug tracking. So it
sounds like we need to either:

1. find the fix
2. move fast and take the blackout of this whole toolchain
3. rollback

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] logstash URL queries.

2015-10-16 Thread Sean Dague

On 10/16/2015 07:45 AM, Tony Breeds wrote:
> Hi Everybody,
> So it looks to me like we recently updated logstash.openstack.org to a
> newer version of kibabna?
> 
> I'll very openly admit that my logstash fu is wanting but I'm having a little
> trouble
> 
> 1) I can't work out how to after performing my query/filtering share the
>results (for example in a LP bug).  I tried using the sharable link but it
>always comes up with:
>http://logstash.openstack.org/#dashboard/undefined/undefined
>and that URL doesn't work by magic :(
> 
>How do we share queries now?

That's a really good question, without sharable urls we're sort of dead
in the water.

> 2) It looks to me like all our old saved URLs are now invalid, this is a big 
> of
>a problem for open bugs with logstash URLs in them.
> 
>Is there anyway we can make the old URLs work again?
> 
> 3) I'm sure that the following is a valid query:
>message:"ValueError: git history requires a target version of 
> pbr.version.SemanticVersion(5.0.1), but target version is 
> pbr.version.SemanticVersion(5.0.0)" AND tags:"console" AND 
> project:openstack/requirements
>but it just spins and never seems to return any matches.  If instead I set
>the query to :
>message:"ValueError: git history requires a target version of 
> pbr.version.SemanticVersion(5.0.1), but target version is 
> pbr.version.SemanticVersion(5.0.0)"
>and use the UI to add filters for tags:"console" and 
> project:openstack/requirements
> 
>I do get some results.
> 
>Is there a problem with my query or the backend?
> 
> Just to be clear, I don't mean this email to sound like I'm complaining, I'm
> just confused and looking for help.

I noticed that we've appeared to deploy v3.1.2, which was released Nov
2014. Is there a reason we're not on 4.1.2 which is a more recent version?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Fwd: IBM DB2 CI spam alot

2015-06-11 Thread Sean Dague

On 06/11/2015 10:02 AM, Anita Kuno wrote:
> On 06/11/2015 05:09 AM, yan fengxi wrote:
>> Hi, everyone. The annoying problem is now resolved. But I found the DB2 CI
>> account was already disabled.
>> We need to get it enabled, so that we could test if our solution works, and
>> we will not publish any results to community until then.
>>
>> @*Anita Kuno, would you please help on this?*
> 
> If Sean agrees this is satisfactory, then yes, I can support your system
> being re-enabled.

Yes, sounds fine.

> 
> Thank you,
> Anita.
>>
>>
>> -- Forwarded message --
>> From: yan fengxi 
>> Date: Thu, Jun 11, 2015 at 2:01 PM
>> Subject: IBM DB2 CI spam alot
>> To: s...@dague.net
>> Cc: openstack-infra@lists.openstack.org, yanfen...@cn.ibm.com,
>> good...@gmail.com
>>
>>
>> Hi, Sean, I am the DB2 CI maintainer "yanfengxi". Thanks for reminding us
>> this problem. It is indeed annoying to see merge failures.
>> Now our team is trying to resolv this problem, by stopping to publish this
>> kind of failures. Before it's resolved, our DB2 CI will not publish result
>> to community.
>>
>> Actually, not long ago, we met merge failure errors. We configured our zuul
>> to stop publishing "Merge failure" errors on specific patch. But we were
>> not aware that patches that depends on the failed patch will also publish
>> merge failures.
>>
>> Cheers
>> Feng Xi Yan(yanfen...@cn.ibm.com)
>>
>> On Thu, Jun 11, 2015 at 5:26 AM, Sean Dague > <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra>>
>> wrote:
>>> * The IBM DB2 CI seems to be running a Zuul, and seems to be reporting
>> *>* back on merge conflicts a lot in completely unhelpful ways -
>> *>* https://review.openstack.org/#/c/188148/
>> <https://review.openstack.org/#/c/188148/>
>> *>>* It seems like no 3rd party CI should be sending merge conflict messages
>> *>* to gerrit. This is my formal complaint on that front, and I'd like the
>> *>* CI system turned off if it's not fixed in the near term.
>> *>>* -Sean
>> *>>* --
>> *>* Sean Dague
>> *>* http://dague.net <http://dague.net/>
>> *>>* ___
>> *>* OpenStack-Infra mailing list
>> *>* OpenStack-Infra at lists.openstack.org
>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra>
>> *>* http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra>*
>>
> 


-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] IBM DB2 CI spam alot

2015-06-11 Thread Sean Dague

On 06/10/2015 06:05 PM, Michael Still wrote:
> I think its fine for you to not want merge conflict messages from
> third party CI systems, but I do think we need to note that many of
> them do this now -- turbo hipster for example. If we turn off that
> message, then those CI systems will silently fail and we will need to
> be better at noticing that they've stopped working.
> 
> The merge failure messages are only really a problem for misconfigured
> CI systems. Isn't turning it off for everyone an over-reaction
> compared to the problem?

I had already sent out the message privately, it had been ignored. So I
put it here publicly as apparently it's the only way to get the team to
respond. I actually hadn't intended this to be an immediate kill: "off
if it's *not* fixed in the near term."

This actually caused a great deal of confusion when addressing a cross
repo patch series recently. Hence my annoyance and reporting.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] IBM DB2 CI spam alot

2015-06-10 Thread Sean Dague

The IBM DB2 CI seems to be running a Zuul, and seems to be reporting
back on merge conflicts a lot in completely unhelpful ways -
https://review.openstack.org/#/c/188148/

It seems like no 3rd party CI should be sending merge conflict messages
to gerrit. This is my formal complaint on that front, and I'd like the
CI system turned off if it's not fixed in the near term.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] A proposal to use phabricator for issue tracking

2015-04-03 Thread Sean Dague

On 04/03/2015 12:12 PM, Monty Taylor wrote:
> On 04/03/2015 12:06 PM, Jeremy Stanley wrote:
>> On 2015-04-03 11:54:00 -0400 (-0400), Sean Dague wrote:
>> [...]
>>> 2) is there an event stream of changes (either real time or rss) that
>>> can be consumed by said tools? Having the change stream would be really
>>> helpful.
>>
>> Which relates to a feature request we hear all the time: "is there a
>> way to have bug events spammed to our IRC channel?" If LP had a
>> proper event stream, we'd probably already be doing that.
>>
> 
> We don't even necessarily need an external thing that consumes an event
> stream - we can deploy an internal agent:
> 
> https://secure.phabricator.com/book/phabdev/class/PhabricatorBot/
> 
> That could spam IRC. OR - that could inject events into a gearman queue
> that one of our existing IRC bots could then spam into channel. Totally
> agree ... would be a nice improvement over the current world.

Yeh, looking at the feed.http-hooks -
https://secure.phabricator.com/T5462 this seems actually reasonably done.

Poking around a bit it also looks like Phabricator is more designed
around the idea you'd write a plugin within it for some of this
functionality. I think living with launchpad as a blackbox-as-a-service
for so long it will take a little getting used to the fact that we can
actually do this kind of thing in service.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] A proposal to use phabricator for issue tracking

2015-04-03 Thread Sean Dague

On 04/03/2015 11:00 AM, Monty Taylor wrote:
> As a follow up to my previous email about thinking about ceasing to use
> storyboard, I have done some work towards investigating using phabricator.
> 
> phabricator came out of Facebook, but has been spun out completely and
> is now managed as an Open Source project. There is a company that has
> formed around it doing support and hosted installs ... so it has a
> vibrant and active developer community working on it.
> 
> In our broader infra ecosystem, we do lots of things with the Wikimedia
> Foundation. They are close collaborators on Jenkins Job Builder, and are
> also zuul users. They have recently migrated their bug tracking to
> phabricator ... so one could imagine that our collaboration could
> continue nicely.
> 
> There are several phabricator features we are not interested in - such
> as code review. Luckily it is easy to turn them off.
> 
> The phabricator model is not exactly what we want, but there are some
> nice things about it. It doesn't fully encompass things our
> project-group/project/branch hierarchy as it relates to things. However,
> it does allow for an issue to have multiple other issues it's blocked
> on, as well as multiple issues it blocks. And an issue can be associated
> with more than one "project"
> 
> A "project" is actually more like an arbitrary tag. That's bad in that
> it's not a structured concept. It's good in that it's flexible. A
> "project also carries with it an kanban-like workboard, that allows for
> different priority setting on a per-project basis.
> 
> In any case - it's hard to think about those things without something to
> look at, so I set up a phabricator instance and did a data
> transformation of the current (ish) storyboard database:
> 
> http://15.126.194.141/
> 
> If you have ever logged in to storyboard, you have a login here with a
> password of "password". As you might imagine, I do not consider the data
> in this instance precious - I may or may not blow it away and reload it
> from newer database dumps or from improved data migration scripts at any
> time.
> 
> You may want to note the workboard concept:
> 
> The issue:
> 
> http://15.126.194.141/T2298
> 
> Is in both the openstack-infra/system-config and the openstack-ci
> projects. Each of those have a workboard:
> 
> http://15.126.194.141/tag/openstack-infra_system-config/
> http://15.126.194.141/tag/openstack-ci/
> 
> T2298 is in the backlog for openstack-infra/system-config but listed in
> priority efforts for openstack-ci.
> 
> Also, you can see that T2298 has a "pholio mock" -
> http://15.126.194.141/M1 - which is an image with design conversation
> associated with it.
> 
> The code to deploy this as well as do the data transformation can be
> found here:
> 
> https://github.com/emonty/puppet-phabricator
> 
> It's not perfect - but I figured that a conversation can't really be had
> without something to point at.

2 specific phabricator questions (which I'm running into in dealing with
the pile of Nova bugs).

1) what's the REST API support look like for building tools outside of
tree?

2) is there an event stream of changes (either real time or rss) that
can be consumed by said tools? Having the change stream would be really
helpful.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Thoughts on evolving Zuul

2015-03-02 Thread Sean Dague

On 02/28/2015 12:48 PM, Clint Byrum wrote:
> Excerpts from Jay Pipes's message of 2015-02-28 08:41:40 -0800:
>> Jim, great stuff. A couple suggestions inline :)
>>
>> On 02/26/2015 09:59 AM, James E. Blair wrote:
>>> A tenant may optionally specify repos from which it may derive its
>>> configuration.  In this manner, a repo may keep its Zuul configuration
>>> within its own repo.  This would only happen if the main configuration
>>> file specified that it is permitted::
>>>
>>>### main.yaml (continued)
>>>- tenant:
>>>name: random-stackforge-project
>>>include:
>>> - global_config.yaml
>>>repos:
>>> - stackforge/random  # Specific project config is in-repo
>>
>> Might I suggest that instead of a repos: YAML block, that instead, the 
>> include: YAML block allow URIs. So, to support some random Zuul config 
>> in a stackforge repo, you could do:
>>
>> include:
>>   - global_config.yaml
>>   - https://git.openstack.org/stackforge/random/tools/zuul.yml
>>
>> That would make the configuration simpler, I think.
>>
> 
> I see where you're driving at, but I'm not sure it is what we'd
> want. First, the knee jerk is zomg I don't want my service restart
> dependent on git.o.o being reachable. But that's not super relevant.
> 
> More importantly, having arbitrary remote config doesn't seem like the
> goal here. I think what Jim is suggesting is that since Zuul already knows
> about repos, that it might also want to dig around in said repos to find
> layouts so that we can let projects manage their own layout once infra
> has acknowledged that their repo is one that zuul needs to care about.
> 
> I usually am for decoupling all the things, but in this case, coupling
> provides a level of comfort for the Zuul operator that would express
> more succinctly what the operator intends: let this repo manage layouts
> for itself.
> 
> BTW, having this is really interesting because in theory one can disable
> a job that a change breaks in the same patch, or enable a job in the
> same change that fixes it. I really like that aspect.

Agreed, I think that it's a great idea from a decoupling aspect, and let
projects have more control about how they are validated. Very neat.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] GlusterFS experimental job failing because of ERROR_ON_CLONE

2015-02-11 Thread Sean Dague

On 02/11/2015 02:08 AM, Bharat Kumar wrote:
> Hi All,
> 
> I issued "check experimental" on patch [1] to run GlusterFS experimental
> job (' check-tempest-dsvm-full-glusterfs-centos7
> <http://logs.openstack.org/86/152286/3/experimental/check-tempest-dsvm-full-glusterfs-centos7/b211fb1/>').
> It is failing with below error in log file. [2]
> 
> /[ERROR] /opt/stack/new/devstack/functions-common:629 Cloning not
> allowed in this configuration/
> 
> This is because if the setting " ERROR_ON_CLONE=True" in localrc file
> [3], so it is unable to clone "devstack-plugin-glusterfs" repository.
> 
> Please let me know your comments to avoid this issue?
> 
> Thanks in advance.
> 
> [1] https://review.openstack.org/#/c/152286/
> [2]
> http://logs.openstack.org/86/152286/3/experimental/check-tempest-dsvm-full-glusterfs-centos7/b211fb1/logs/devstacklog.txt.gz
> [3]
> http://logs.openstack.org/86/152286/3/experimental/check-tempest-dsvm-full-glusterfs-centos7/b211fb1/logs/localrc.txt.gz

This is where it went wrong:

  - shell: |
  #!/bin/bash -xe
  export PYTHONUNBUFFERED=true
  export DEVSTACK_GATE_TIMEOUT=120
  export DEVSTACK_GATE_TEMPEST=1
  export DEVSTACK_GATE_TEMPEST_FULL=1
  export PROJECTS="stackforge/devstack-plugin-glusterfs $PROJECTS"
  export DEVSTACK_LOCAL_CONFIG=$(cat <http://logs.openstack.org/86/152286/3/experimental/check-tempest-dsvm-full-glusterfs-centos7/b211fb1/logs/devstack-gate-setup-workspace-new.txt.gz#_2015-02-11_04_00_05_111


and here -
http://logs.openstack.org/86/152286/3/experimental/check-tempest-dsvm-full-glusterfs-centos7/b211fb1/logs/devstack-gate-setup-workspace-new.txt.gz#_2015-02-11_04_00_05_825

However because of the name mismatch devstack isn't picking it up.

Will try to tighten this up in the future to make the issues more obvious.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul logo proposal

2014-12-03 Thread Sean Dague

On 12/03/2014 12:38 PM, Elizabeth K. Joseph wrote:
> On Wed, Dec 3, 2014 at 9:09 AM, Sean Dague  wrote:
>> On 12/03/2014 12:03 PM, Elizabeth K. Joseph wrote:
>>> Hi everyone,
>>>
>>> I do a fair number of presentations about our infrastructure, and a
>>> few times after showing this slide[0] I've gotten questions about what
>>> Zuul's logo is. Huh! Maybe Zuul should have a logo.
>>>
>>> After describing Zuul as a pink dragon gatekeeper, an open source
>>> contributor + artist friend of mine whipped up this:
>>> http://princessleia.com/temp/Zuul-sketch.jpg
>>
>> 404?
> 
> It's been working fine, and confirmed again it still does. Copy/paste error?
> 

Nope, looks like you have IPv6 on your domain, but not an IPv6 vhost.

ribos:~/code/openstack/devstack-gate(no_grenade_ceilometer)> HEAD
http://princessleia.com/temp/Zuul-sketch.jpg
404 Not Found
Connection: close
Date: Wed, 03 Dec 2014 17:40:26 GMT
Server: Apache/2.2.22 (Debian)
Vary: Accept-Encoding
Content-Type: text/html; charset=iso-8859-1
Client-Date: Wed, 03 Dec 2014 17:40:26 GMT
Client-Peer: 2600:3c00::f03c:91ff:fe89:3d09:80
Client-Response-Num: 1

ribos:~/code/openstack/devstack-gate(no_grenade_ceilometer)> GET
http://princessleia.com/temp/Zuul-sketch.jpg


404 Not Found

Not Found
The requested URL /temp/Zuul-sketch.jpg was not found on this server.

Apache/2.2.22 (Debian) Server at princessleia.com Port 80




-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Zuul logo proposal

2014-12-03 Thread Sean Dague

On 12/03/2014 12:03 PM, Elizabeth K. Joseph wrote:
> Hi everyone,
> 
> I do a fair number of presentations about our infrastructure, and a
> few times after showing this slide[0] I've gotten questions about what
> Zuul's logo is. Huh! Maybe Zuul should have a logo.
> 
> After describing Zuul as a pink dragon gatekeeper, an open source
> contributor + artist friend of mine whipped up this:
> http://princessleia.com/temp/Zuul-sketch.jpg

404?

> 
> Thoughts? Comments?
> 
> So far I've mostly heard that it's "too nice" because our Zuul is mean
> - but I'm not sure that's such a bad thing ;)
> 
> As mentioned during our meeting this week, this would show up on
> things like slides and documentation, much like Gerrit's Diffy
> Cuckoo[1] does.
> 
> [0] http://docs.openstack.org/infra/publications/sysadmin-codereview/#(4)
> [1] http://commondatastorage.googleapis.com/gerrit-static/diffy-w200.png
> 


-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] best way to add handlebars.js to infra?

2014-10-01 Thread Sean Dague

On 09/29/2014 07:33 PM, Michael Krotscheck wrote:
> 
> On Sep 29, 2014, at 3:24 PM, Monty Taylor  <mailto:mord...@inaugust.com>> wrote:
> 
>> I am a big fan of the tooling pattern in storyboard-webclient - since
>> it's very much aimed at solving this. Some combination of krotscheck and
>> I will make an example patch we can all point at.
> 
> 
> I’ve added a WIP patch that starts to add the javascript toolchain. More
> work is needed
> https://review.openstack.org/#/c/124927/
> 
> The builds that execute the javascript toolchain itself are found here:
> http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/javascript.yaml

So I see how that might be used in tox, but how is that plugged into the
installation path?

As a developer what command do I need to run before?:

cd web/src/ && python -m SimpleHTTPServer

... to manually verify

And how does setup.py install for this project do the right thing?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] best way to add handlebars.js to infra?

2014-09-29 Thread Sean Dague

Over the weekend I redid elastic-recheck to use handlebars.js instead of
inline dom building... it's so much nicer:
https://github.com/openstack-infra/elastic-recheck/blob/master/web/share/index.html#L61-L79
and
https://github.com/openstack-infra/elastic-recheck/blob/master/web/share/elastic-recheck.js#L82-L85

I'd really like to get this pattern into more of our tooling, as I think
it will expand the scope of the people who can contribute, as they can
address formatting issues outside of javascript logic.

It seems like it would be nice to have it be rooted on
static.openstack.org like jquery is, so we've got a common copy that
everyone can use.

So... questions:

- Should javascript libraries be hosted on static.openstack.org (which
reduces what users need to download) or should we carry them around in
every app (which means static.o.o could be blocked or down, but the app
would still work)?

- Regardless of where they go, how should we get them there via puppet?
 - wget + file install?
 - npm ?
 - bower?
 - some 3rd thing?

Would be good to have a pattern here so we can repeat as required in the
future.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Cloudwatt Spamming: Fwd: Jenkins build is back to normal : iaas-cw_console_master-test #2

2014-07-25 Thread Sean Dague

Cloudwatt seems to have some jenkins environment up and is spamming a
ton of people with Jenkins messages.

There is no contact info that looks real here, so I don't know how to
direct connect them. Can anyone in infra / 3rd party testing reach out
to them?

-Sean


 Forwarded Message 
Subject: Jenkins build is back to normal : iaas-cw_console_master-test #2
Date: Fri, 25 Jul 2014 10:51:58 + (UTC)
From: r...@i-cisics-.adm.int0.aub.cloudwatt.net
To: yves-gwenael.bour...@cloudwatt.com, cedric.sou...@cloudwatt.com,
bere...@b1-systems.de, jpom...@linux.vnet.ibm.com, amitp...@gmail.com,
crobe...@redhat.com, ericpeter...@hp.com, tsuf...@mirantis.com,
tqt...@us.ibm.com, lbezd...@redhat.com, mru...@redhat.com,
pbela...@redhat.com, m...@mattfischer.com, sdrapisa...@gmail.com,
lawrancej...@gmail.com, jing.liuq...@99cloud.net, tianli...@awcloud.com,
francois.magi...@objectif-libre.com, julie.gra...@hp.com,
niu.zgli...@gmail.com, robert.miziel...@epitech.eu, da...@dcaudill.com,
c...@us.ibm.com, ogaz...@gmail.com, mich...@ebaysf.com,
openstack-infra@lists.openstack.org, ikolodyaz...@mirantis.com,
ramis...@redhat.com, pawel.skow...@intel.com, kanagaraj.manic...@hp.com,
tnova...@redhat.com, and...@tesora.com, veronica.a.mu...@intel.com,
jpic...@redhat.com, jebl...@openstack.org, david.l...@hp.com,
kevin.stev...@rackspace.com, richard.haga...@hp.com,
akriv...@redhat.com, juan.m.o...@intel.com, alex.gay...@gmail.com,
george.peristera...@enovance.com, gary.w.sm...@hp.com,
jun11matayo...@gmail.com, santiago.b.baldas...@intel.com,
liyingjun1...@gmail.com, rev...@openstack.org, tangmeiya...@gmail.com,
li...@ryanpetrello.com, davide.gue...@hp.com,
mail.ashishchan...@gmail.com, stpie...@metacloud.com, s...@dague.net,
absub...@cisco.com, gloria...@hp.com, slick...@gmail.com,
j...@cs.stanford.edu, r1chardj0...@gmail.com, clay...@oneill.net,
leandro.i.costant...@intel.com, yongli...@intel.com,
mot...@da.jp.nec.com, t.v.ovtchinnik...@gmail.com, rob.raym...@hp.com,
n...@metacloud.com, lin-hua.ch...@hp.com, rodrig...@lsd.ufcg.edu.br,
david.laps...@metacloud.com, lsm...@redhat.com,
andres.buras...@intel.com, ala.rezmer...@cloudwatt.com,
maria.nita...@gmail.com, ihrac...@redhat.com, dragon...@163.com,
openst...@sheep.art.pl, jom...@redhat.com,
guillermo.d.cabr...@intel.com, drf...@us.ibm.com,
dimitri.mazma...@ericsson.com, brianna.pou...@jhuapl.edu,
robert.miziel...@cloudwatt.com, jamielen...@redhat.com,
jroov...@cisco.com, maxime.vid...@enovance.com, matt.w...@hp.com,
masco.kaliyamoor...@enovance.com, na...@ntti3.com, kisp...@gmail.com,
dk...@redhat.com, emag...@gmail.com

See





___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Status of check-tempest-dsvm-f20 job

2014-06-18 Thread Sean Dague

On 06/18/2014 05:45 AM, Eoghan Glynn wrote:
> 
> 
>> On 06/18/2014 06:46 PM, Eoghan Glynn wrote:
>>> If we were to use f20 more widely in the gate (not to entirely
>>> supplant precise, more just to split the load more evenly) then
>>> would the problem observed tend to naturally resolve itself?
>>
>> I would be happy to see that, having spent some time on the Fedora
>> bring-up :) However I guess there is a chicken-egg problem with
>> large-scale roll-out in that the platform isn't quite stable yet.
>> We've hit some things that only really become apparent "in the wild";
>> differences between Rackspace & HP images, issues running on Xen which
>> we don't test much, the odd upstream bug requiring work-arounds [1],
>> etc.
> 
> Fair point.
>  
>> It did seem that devstack changes was the best place to stabilze the
>> job.  However, as is apparent, devstack changes often need to be
>> pushed through quickly and that does not match well with a slightly
>> unstable job.
>>
>> Having it experimental in devstack isn't much help in stabilizing.  If
>> I trigger experimental builds for every devstack change it runs
>> several other jobs too, so really I've just increased contention for
>> limited resources by doing that.
> 
> Very true also.
> 
>> I say this *has* to be running for devstack eventually to stop the
>> fairly frequent breakage of devstack on Fedora, which causes a lot of
>> people wasted time often chasing the same bugs.
> 
> I agree, if we're committed to Fedora being a first class citizen (as
> per TC distro policy, IIUC) then it's crucial that Fedora-specific
> breakages are exposed quickly in the gate, as opposed to being seen
> by developers for the first time in the wild whenever they happen to
> refresh their devstack.
> 
>> But in the mean time, maybe suggestions for getting the Fedora job
>> exposure somewhere else where it can brew and stabilize are a good
>> idea.
> 
> Well, I would suggest the ceilometer/py27 unit test job as a first
> candidate for such exposure.
> 
> The reason being that mongodb 2.4 is not available on precise, but
> is on f20. As a result, the mongodb scenario tests are effectively
> skipped in the ceilo/py27 units, which is clearly badness and needs
> to be addressed.
> 
> Obviously this lack of coverage will resolve itself quickly once
> the Trusty switchover occurs, but it seems like we can short-circuit
> that process by simply switching to f20 right now.
> 
> I think the marconi jobs would be another good candidate, where
> switching over to f20 now would add real value. The marconi tests
> include some coverage against mongodb proper, but this is currently
> disabled, as marconi requires mongodb version >= 2.2 (and precise
> can only offer 2.0.4).
> 
>> We could make a special queue just for f20 that only triggers that
>> job, if others like that idea.
>>
>> Otherwise, ceilometer maybe?  I made some WIP patches [2,3] for this
>> already.  I think it's close, just deciding what tempest tests to
>> match for the job in [2].
> 
> Thanks for that.
> 
> So my feeling is that at least the following would make sense to base
> on f20:
> 
> 1. ceilometer/py27 
> 2. tempest variant with the ceilo DB configured as mongodb
> 3. marconi/py27
> 
> Then a random selection of other p27 jobs could potentially be added
> over time to bring f20 usage up to approximately the same breath
> as precise.
> 
> Cheers,
> Eoghan

Unit test nodes are a different image yet still. So this actually
wouldn't make anything better, it would just also stall out ceilometer
and marconi unit tests in the same scenario.

I think the real issue is to come up with a fairer algorithm that
prevents any node class from starving, even in the extreme case. And get
that implemented and accepted in nodepool.

I do think devstack was the right starting point, because it fixes lots
of issues we've had with us accidentally breaking fedora in devstack.
We've yet to figure out how overall reliable fedora is going to be.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Status of check-tempest-dsvm-f20 job

2014-06-17 Thread Sean Dague

On 06/17/2014 04:16 PM, Ian Wienand wrote:
> Hi,
> 
> I added an item to today's meeting but we didn't get to it.
> 
> I'd like to bring up the disablement of the F20 based job, disabled in
> [1] with some discussion in [2].
> 
> It's unclear to me why there are insufficient Fedora nodes.  Is the
> problem that Fedora is booting too slowly compared to other
> distributions?  Is there some other Fedora specific issue we can work
> on?
> 
> Demoting to experimental essentially means stopping the job and
> letting it regress; when the job was experimental before I was
> triggering the run for each devstack change (to attempt to maintain
> stability) but this also triggers about 4 other experimental jobs,
> making the load issues even worse.
> 
> What needs to happen before we can get this job promoted again?
> 
> Thanks

It was demoted yesterday when devstack and devstack-gate changes were
stacking up in check waiting on an f20 node to be allocated for a
non-voting job.

When we it off devstack changes had been waiting 5 hrs in check, with no
f20 node allocated. One of those was a critical fix for gate issues
which we just manually gate promoted.

Because this is the way this degrades when we are using all our quota,
I'm really wary of adding these back until we discuss the expectations
here (possibly in Germany). Because devstack ends up often being a knob
we can adjust to dig ourselves out of a gate backup, making it get extra
delay when we are at load, is something I don't think serves us well.

If nodepool (conceptually) filled the longest outstanding requests with
higher priority, I'd be uber happy. This would also help with more fully
using our capacity, because the mix of nodes that we need any given hour
kind of changes. But as jeblair said, this is non trivial to implement.
Ensuring a minimum number of nodes (where that might be 1 or 2) for each
class would have helped this particular situation. We actually had 0
nodes in use or ready of the type at the time.

So I'm in the 'prefer not' camp for devstack right now.

-Sean

-- 
Sean Dague
http://dague.net

signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] elastic search capacity

2014-06-11 Thread Sean Dague

We've not drained the Elastic Search Queue in 2 days, even when we've
not been using all our test node cluster capacity. This strikes me as
very bad.

Elastic Recheck really requires that we can stay current on Elastic
Search, and that seems like a less and less likely outcome.

Will turning of CRM114 help?

Do we need more workers? (possibly more workers made this worse)

Is there something else?

If the answer is "less data", we're going to start hitting the point
where it's not useful any more. And we're only going to get more data.
So I feel like we need to figure out pretty soon if we can keep Elastic
Search or we should start considering other options.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] disable full recheck if only the commit message changed

2014-06-07 Thread Sean Dague

On 06/07/2014 03:58 AM, Antoine Musso wrote:
> Le 06/06/2014 14:06, Jeremy Stanley a écrit :
>> Changing the commit message changes the Git SHA of the commit, and
>> Gerrit reports this in its event stream as a distinct patchset. Zuul
>> would need to retrieve previous and current patchsets on every event
>> and compare them... I'm not sure the relatively minimal savings
>> would be worth the additional complexity, though others may
>> disagree.
> 
> Hello,
> 
> It might not be that hard to do.  Gerrit has an option to copy previous
> votes when a new patchset is a trivial rebase or has no code change (ie
> only the commit message got changed).
> 
> https://review.openstack.org/Documentation/config-labels.html#label_copyAllScoresIfNoCodeChange
> 
> So potentially Gerrit can be enhanced to attach that informations to the
> new-patchset event sent over stream-events.  From there it will be
> trivial to skip the check in Zuul.

Some projects do enforcement on the commit message by bots. So before
doing that you'd want to separate off that enforcement in a way that it
could be independent of running functional tests.

Honestly, though, this is currently having the good side effect of
running tests an extra set of times on people's code. And given that the
biggest issue we are fighting right now is that too many bugs landed,
running tests on change sets more times before they go in isn't a
terrible idea.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Help with some reviews for devstack/f20 testing

2014-05-20 Thread Sean Dague

On 05/19/2014 10:58 PM, Ian Wienand wrote:
> Hi,
> 
> I've got the devstack on Fedora 20 job working [1] but I just need a
> little help to clear the final reviews for the infra parts.
> 
> In particular, a small series to handle some different log formats on
> Fedora
> 
>  https://review.openstack.org/92748 - distro check fns
>  https://review.openstack.org/93248 - syslog format of f20
>  https://review.openstack.org/93249 - minor cleanup
>  https://review.openstack.org/93250 - rabbitmq check
>  https://review.openstack.org/93251 - different apache logs on f20
> 
> One more important one fixes up log archiving as it turns out rename
> isn't portable:
> 
>  https://review.openstack.org/92981
> 
> Some other things I noticed when working on various issues, not
> critical to the job:
> 
>  https://review.openstack.org/93646 - exit on stack/grenade failure
>  https://review.openstack.org/93862 - handle workflow in "check experimental"
> 
>  https://review.openstack.org/93377 - nova-conductor/nova-compute start order
>sdague -1'd; not sure if following
>patch is sufficient
>  https://review.openstack.org/93375 - nova-conductor log when connected
> 
> I'll be happy to fix up any review comments quickly
> 
> Thanks!
> 
> -i
> 
> [1] devstack installing & tempest running.  There are some workarounds
> required in devstack (see https://review.openstack.org/93845) but
> progress is being made on the upstream bugs.  With those resolved
> in some fashion I'd be fairly confident about moving this to a
> non-voting check job soon.

Ian, thanks for this really clear roadmap. I've gone through and left
feedback on most of the patches. This is looking pretty good, just a
couple of things I think we need to clean up first.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] project specific gerrit dashboards

2014-05-19 Thread Sean Dague

For Tempest we'd like to make a project specific dashboard of common
items that we'd like people regularly reviewing. This is going to, in
the general case, require me adding support for dashboards to jeepyb.
Which is fine, though I'll need a few pointers.

However...

What we'd also like is to be able to flag particular blueprints that we
want people to look at this week by adding a section to the dashboard.
Which means a piece of the dashboard is going to change pretty rapidly.

Ideally I'd like this tempest dashboard to be able to be reviewed and
approved by the tempest core team, without infra needing to be involved.
However, I'm not sure I see a path in how that would happen given the
weird gerrit ref structure. Anyone have thoughts on a sane way to do that?

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [openstack-dev] [infra] Consolidating efforts around Fedora/Centos gate job

2014-04-11 Thread Sean Dague

On 04/11/2014 01:43 AM, Ian Wienand wrote:
> Hi,
> 
> To summarize recent discussions, nobody is opposed in general to
> having Fedora / Centos included in the gate.  However, it raises a
> number of "big" questions : which job(s) to run on Fedora, where does
> the quota for extra jobs come from, how do we get the job on multiple
> providers, how stable will it be, how will we handle new releases,
> centos v fedora, etc.
> 
> I think we agreed in [1] that the best thing to do is to start small,
> get some experience with multiple platforms and grow from there.  Thus
> the decision to target a single job to test just incoming devstack
> changes on Fedora 20.  This is a very moderate number of changes, so
> adding a separate test will not have a huge impact on resources.
> 
> Evidence points to this being a good point to start.  People
> submitting to devstack might have noticed comments from "redhatci"
> like [2] which reports runs of their change on a variety of rpm-based
> distros.  Fedora 20 has been very stable, so we should not have many
> issues.  Making sure it stays stable is very useful to build on for
> future gate jobs.
> 
> I believe we decided that to make a non-voting job we could just focus
> on running on Rackspace and avoid the issues of older fedora images on
> hp cloud.  Longer term, either a new hp cloud version comes, or DIB
> builds the fedora images ... either way we have a path to upgrading it
> to a voting job in time.  Another proposal was to use the ooo cloud,
> but dprince feels that is probably better kept separate.
> 
> Then we have the question of the nodepool setup scripts working on
> F20.  I just tested the setup scripts from [3] and it all seems to
> work on a fresh f20 cloud image.  I think this is due to kchamart,
> peila2 and others who've fixed parts of this before.
> 
> So, is there:
> 
>  1) anything blocking having f20 in the nodepool?
>  2) anything blocking a simple, non-voting job to test devstack
> changes on f20?
> 
> Thanks,
> 
> -i
> 
> [1] 
> http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-04-08-19.01.log.html#l-89
> [2] http://people.redhat.com/~iwienand/86310/
> [3] 
> https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/nodepool/scripts

I can't speak to #1, however I'm +1 on this effort. Would love to have
devstack running on Fedora as well on changes in general, and especially
on devstack, as we accidentally break Fedora far too often.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Multi-node aspects of the infra

2014-04-04 Thread Sean Dague

On 04/03/2014 10:30 AM, Jérôme Gallard wrote:
> Hi James,
> 
> Thanks for your answer!
> As Mathieu said, we are currently working on the multi-node part of the
> project and try to understand how the multi-node feature of nodepool
> could be used by the other components of the infra.
> 
> Does something like the following lines make sense?
> 
> + when a job with multi-node is required :
> - Jenkins selects a slave who has (for instance) the
> "multi-devstack-precise" label
> - Jenkins clones devstack-gate inside the primary node
> - devstack-gate is executed "as usual" on the primary node (call of
> devstack-vm-gate-wrap.sh --> setup_workspace --> setup_project -->
> devstack-vm-gate.sh --> setup_localrc --> stack.sh)
> - at the end of the execution of stack.sh, a new part of code into
> devstack-vm-gate.sh will allow to SSH into the "subnodes" thanks to the
> /etc/nodepool files (this step may probably be done in parallel)
> * devstack-gate is cloned inside the subnodes
> * devstack-vm-gate-wrap.sh is executed inside the subnodes with
> specifics environments variables in order to generate a good localrc for
> the subnodes (some modifications of setup_localrc may have to probably
> be done)

This looks sound. We'll need a new service selector here for the
subnodes, as they will be very different in config (and should take
*far* less time to come up).

As soon as we have anything even remotely starting to function we should
do a non-voting job on devstack and devstack-gate, as I expect we'll be
tweaking a bunch there to get something repeatable.

But I think this is a very solid plan.

I also think this would be a great Summit Session for Atlanta as
hopefully we'll have some prelim work happening and can figure out next
steps.

This - https://github.com/sdague/devstack-vagrant - has some
demonstration of multi node devstack in a local environment. There may
be handy bits in there to figure out how to life into devstack-gate
feature selection.

-Sean

-- 
Sean Dague
http://dague.net

signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Announcing a new infrastructure project, Vinz code review system.

2014-03-18 Thread Sean Dague

On 03/18/2014 07:49 AM, Monty Taylor wrote:
> On 03/18/2014 06:55 AM, Sean Dague wrote:
>> On 03/18/2014 06:04 AM, Thierry Carrez wrote:
>>> Monty Taylor wrote:
>>>> Which means, although I tend to have a side which agrees with Clint and
>>>> Clark that replacing gerrit is a bit of a potential giant rathole - I
>>>> also think that making a scalable thing that architecturally fits with
>>>> the other things we've got would be nice.
>>>
>>> It would be nice. But I think there are two major differences between
>>> StoryBoard and Vinz from a development effort perspective...
>>>
>>> First, task tracking is fundamentally simple. As my POC proved, it's
>>> just a few database tables -- the complexity you add on top of it is
>>> just pure bonus. I see the basic code review functionality and the
>>> manipulation of associated git repositories as a much more complex
>>> endeavor.
>>>
>>> Second, I think Launchpad created a lot of developer itches that
>>> StoryBoard hopefully will allow them to scratch. So we can hope to see
>>> our developers help with StoryBoard in the future... I'm not sure Gerrit
>>> created enough pain so far. Yes it's ugly but it gets the work done
>>> pretty well. Could you point to a specific shortcoming that you can't
>>> address in the legacy app and that would be the killer feature ? In
>>> StoryBoard, that would be the ability to track cross-project features
>>> with tons of tasks.
>>>
>>> Don't get me wrong, I don't want to prevent anyone from working on
>>> anything they would like to. I'm just afraid that this is a large
>>> project and I don't see it attracting enough contributors to be a
>>> long-term sustainable alternative... potentially making it a gigantic
>>> distraction.
>>
>> I agree that if it's a problem that a set of people really want to
>> solve, so be it.
>>
>> Personally though, as one of the people that's probably spend as much
>> time in Gerrit as anyone -
>> http://stackalytics.com/?release=icehouse&metric=marks&project_type=openstack&module=&company=&user_id=
>>
>> I think the bar for replacement is really high. Because if a new tool
>> impacts my ability to review code in any negative way, be it accuracy or
>> volume, then I'm going to be properly annoyed.
>>
>> Which I do think is a difference between StoryBoard and this. With
>> launchpad the power users stopped being able to do their job at all, to
>> the point where many projects largely opt out of blueprints / bugs.
>> That's not true for code review. It's actually kind of the opposite, as
>> we have started moving non code things into Gerrit because it's actually
>> very good at it's of recording votes, seeing specific comments, and
>> recording history.
>>
>> So this has to not only be better for deployers, but it has to be better
>> for us as core reviewers. Review bandwidth is our number one constrained
>> resource in OpenStack, and has been for years, so any negative impact
>> there would be as damaging to a release as us disabling the gate
>> entirely.
>>
>> So who on the Vinz design team has regularly done 2000 gerrit reviews a
>> year to ensure that level of through put isn't impacted (in any gerrit,
>> doesn't have to be the community one)? Because that's my primary
>> concern. A new project to replace a key system I rely on every day.
>> Whose quirks I've come to understand well. With a set of tools that I
>> have to further optimize it. And a team that I've never heard of before,
>> that I see no track record of using the community gerrit in any volume,
>> coming forward to propose a replacement. So please understand I have
>> very deep concerns.
> 
> I think this is a bit harsh on the folks suggesting that they work on this.
> 
> Before I expand on that - I'd like to point out that gerrit is TERRIBLE
> at dealing with the volume of reviews we all have. I believe the nova
> core team recently was discussing putting attempts at new UI shims on
> top of gerrit to try to deal with queuing and prioritization better. I,
> for one, cannot deal with the mass of stuff I'm supposed to be reviewing
> worth a crap. I'd love a better UI.
> 
> Back to the vinz proposal.
> 
> The team who is proposing it has no track record in infra. However, they
> did start exactly right - the contacted both Jim and I (and apparently
> fungi) and we talked w

Re: [OpenStack-Infra] Announcing a new infrastructure project, Vinz code review system.

2014-03-18 Thread Sean Dague

On 03/18/2014 06:04 AM, Thierry Carrez wrote:
> Monty Taylor wrote:
>> Which means, although I tend to have a side which agrees with Clint and
>> Clark that replacing gerrit is a bit of a potential giant rathole - I
>> also think that making a scalable thing that architecturally fits with
>> the other things we've got would be nice.
> 
> It would be nice. But I think there are two major differences between
> StoryBoard and Vinz from a development effort perspective...
> 
> First, task tracking is fundamentally simple. As my POC proved, it's
> just a few database tables -- the complexity you add on top of it is
> just pure bonus. I see the basic code review functionality and the
> manipulation of associated git repositories as a much more complex endeavor.
> 
> Second, I think Launchpad created a lot of developer itches that
> StoryBoard hopefully will allow them to scratch. So we can hope to see
> our developers help with StoryBoard in the future... I'm not sure Gerrit
> created enough pain so far. Yes it's ugly but it gets the work done
> pretty well. Could you point to a specific shortcoming that you can't
> address in the legacy app and that would be the killer feature ? In
> StoryBoard, that would be the ability to track cross-project features
> with tons of tasks.
> 
> Don't get me wrong, I don't want to prevent anyone from working on
> anything they would like to. I'm just afraid that this is a large
> project and I don't see it attracting enough contributors to be a
> long-term sustainable alternative... potentially making it a gigantic
> distraction.

I agree that if it's a problem that a set of people really want to
solve, so be it.

Personally though, as one of the people that's probably spend as much
time in Gerrit as anyone -
http://stackalytics.com/?release=icehouse&metric=marks&project_type=openstack&module=&company=&user_id=
I think the bar for replacement is really high. Because if a new tool
impacts my ability to review code in any negative way, be it accuracy or
volume, then I'm going to be properly annoyed.

Which I do think is a difference between StoryBoard and this. With
launchpad the power users stopped being able to do their job at all, to
the point where many projects largely opt out of blueprints / bugs.
That's not true for code review. It's actually kind of the opposite, as
we have started moving non code things into Gerrit because it's actually
very good at it's of recording votes, seeing specific comments, and
recording history.

So this has to not only be better for deployers, but it has to be better
for us as core reviewers. Review bandwidth is our number one constrained
resource in OpenStack, and has been for years, so any negative impact
there would be as damaging to a release as us disabling the gate entirely.

So who on the Vinz design team has regularly done 2000 gerrit reviews a
year to ensure that level of through put isn't impacted (in any gerrit,
doesn't have to be the community one)? Because that's my primary
concern. A new project to replace a key system I rely on every day.
Whose quirks I've come to understand well. With a set of tools that I
have to further optimize it. And a team that I've never heard of before,
that I see no track record of using the community gerrit in any volume,
coming forward to propose a replacement. So please understand I have
very deep concerns.

Storyboard was started by the person who was the #1 user of Launchpad in
our Community, because launchpad was a giant efficiency problem in
making good OpenStack releases.

So this is not the same thing as Storyboard.

There are other options besides whole sale replacement. For instance,
with a modern Gerrit the UI could be replaced with a custom one built on
top of the REST api. That seems like a better starting point, as you
could sort out the UX challenges first, get an alt interfaces that we
all agree on, get tons of feedback in a live / high volume environment.
It can be a 2nd interface that gets used along side the existing one for
a long time, and constantly iterated on to demonstrate improvements.

Then the backend switch could be taken on after the UX has proven
itself. This would have the advantage of being able to be dogfooded
really soon (as soon as the gerrit 2.8 deploy completes). It also means
that for UI that wasn't completed yet, it could fall back to gerrit
interfaces.

Anyway, realize that unlike launchpad, gerrit actually has fans. And
while I 100% agree Google doesn't know how to run an open source project
(which I think is the a challenge for any organization that largely
collocates their teams, as anyone not within walking distance is
*other*), what they've managed to produce is still pretty reasonable for
those of us using it every d

Re: [OpenStack-Infra] Announcing a new infrastructure project, Vinz code review system.

2014-03-17 Thread Sean Dague

On 03/17/2014 04:20 PM, Philip Schwartz wrote:
> 
> 
> On 3/17/14, 3:43 PM, "Clint Byrum"  wrote:
> 
>> According to ohloh.net:
>>
>> http://www.ohloh.net/p/android - 576 developers
>> http://www.ohloh.net/p/mediawiki - 233 developers
>> http://www.ohloh.net/p/openstack - 1556 developers
> 
> These numbers only tell part of the total. There are 3 major Android
> projects that are using gerrit (one of which is google¹s android project
> itself). This total close to 1700 developers.
> 
> As for mediawiki, the base mediawiki code is only a portion of what is
> managed by their Gerrit install. They have almost 400 independent projects
> managed in their system.
> 
> Yes code wise for an individual project, OpenStack has more contributors,
> but when it comes to the mass of the Gerrit install, ours is still much
> smaller.


ssh review.openstack.org gerrit ls-projects | wc -l
254

So, smaller, but I'm not sure I'd say *much* smaller.

> This is part of what leads us to be considered smaller. Both of the
> aforementioned projects not only run their own development, but from the
> contributor list from gerrit itself, members of the projects account for
> almost 75% of the development of Gerrit if not more. If we picked up
> contributing on a steady basis to their repositories, we would have more
> of a say in the process. I am just not convinced that adding to a
> monolithic Java application is worth our time.
> 
> Yes, there are contributors to OpenStack that are learning python and have
> focused more in the past on Java development, but I would think they would
> be in the minority when it comes to the community as a whole.
> 
>> Is it unreasonable to try and add a better system for extending and
>> integrating with Gerrit? I go back to the fact that the challenge for a
>> Python team with a python plugin will only be slightly less, so I don't
>> think this is a really great argument for writing something from
>> Scratch.
> 
> This is a reasonable argument and was looked at as a first measure. But it
> doesn¹t solve a lot of the issues that are seen with Gerrit because we
> would still be limited to what Gerrit plugins can do and the minimal set
> of triggers it currently supports.
> 
> Would the time be better spent writing addons like this to Gerrit which
> would be little more then hacks to work around where we see short comings
> or issues, or spend that time working on something that would be built
> from the ground up to solve the issues as we see fit? I personally think
> the time would be better spent working on a solution that solves our
> issues and workflow better.
> 
>> Storyboard is not a good example of a reason to do this. Storyboard
>> is coming from a place where there really isn't a tool that will meet
>> the needs of OpenStack's processes. Launchpad is already written in
>> Python, but we are not going to fix/extend it because it is an extremely
>> monolithic code base and we're not really interested in rewriting it to
>> be lean.
>>
>> To my mind, Gerrit is doing its job well, and we'd just like to make
>> some incremental improvements.
> 
> From how I see it and after many discussions with members of the Infra
> team, I would see Gerrit in the same boat as launchpad. It is a monolithic
> application with developers that are not versed with the structure/design.
> 
> The Storyboard project came to life for a need to have a system that meets
> the OpenStack project needs better then Launchpad does. I have proposed
> Vinz due to the same type of need. Gerrit works for our needs just as much
> as launchpad worked for our needs 2+ years ago. It can continue to meet
> our base needs, but as we grow and more external test systems are added
> internal and external to infra, when is living with the shoehorned form
> into our workflow no longer enough.
> 
> The whole point to this proposal is to beat the day that Gerrit won¹t fit
> our workflow easily by having something ready that will be easier for us
> to maintain and modify.

In fairness Storyboard came to life after we got to the point of having
a substantial number of artifacts that we could no longer manipulate in
Launchpad because it would timeout on API calls 100% of the time.

>> Please please, even if you go through, do not integrate tightly with
>> Storyboard. Use the API it provides, drive the API development, and keep
>> them modular so that we can improve them at different paces.
> 
> The idea¹s for integration are purely through API. Integrating beyond that
> will cause more headaches they benefits it might create.

Have you considered other open source efforts to build upon,

Re: [OpenStack-Infra] [openstack-dev] Intermittent failures cloning noVNC from github.com/kanaka

2014-03-13 Thread Sean Dague

it_clone https://github.com/kanaka/noVNC.git 
> /opt/stack/noVNC master 
> 
> 2014-03-11 15:00:33.786 | + GIT_REMOTE= https://github.com/kanaka/noVNC.git 
> 
> 2014-03-11 15:00:33.788 | + GIT_DEST=/opt/stack/noVNC 
> 
> 2014-03-11 15:00:33.789 | + GIT_REF=master 
> 
> 2014-03-11 15:00:33.790 | ++ trueorfalse False False 
> 
> 2014-03-11 15:00:33.791 | + RECLONE=False 
> 
> 2014-03-11 15:00:33.792 | + [[ False = \T\r\u\e ]] 
> 
> 2014-03-11 15:00:33.793 | + echo master 
> 
> 2014-03-11 15:00:33.794 | + egrep -q '^refs' 
> 
> 2014-03-11 15:00:33.795 | + [[ ! -d /opt/stack/noVNC ]] 
> 
> 2014-03-11 15:00:33.796 | + [[ False = \T\r\u\e ]] 
> 
> 2014-03-11 15:00:33.797 | + git_timed clone 
> https://github.com/kanaka/noVNC.git /opt/stack/noVNC 
> 
> 2014-03-11 15:00:33.798 | + local count=0 
> 
> 2014-03-11 15:00:33.799 | + local timeout=0 
> 
> 2014-03-11 15:00:33.801 | + [[ -n 0 ]] 
> 
> 2014-03-11 15:00:33.802 | + timeout=0 
> 
> 2014-03-11 15:00:33.803 | + timeout -s SIGINT 0 git clone 
> https://github.com/kanaka/noVNC.git /opt/stack/noVNC 
> 
> 2014-03-11 15:00:33.804 | Cloning into '/opt/stack/noVNC'... 
> 
> 2014-03-11 15:03:13.694 | error: RPC failed; result=56, HTTP code = 200 
> 
> 2014-03-11 15:03:13.695 | fatal: The remote end hung up unexpectedly 
> 
> 2014-03-11 15:03:13.697 | fatal: early EOF 
> 
> 2014-03-11 15:03:13.698 | fatal: index-pack failed 
> 
> 2014-03-11 15:03:13.699 | + [[ 128 -ne 124 ]] 
> 
> 2014-03-11 15:03:13.700 | + die 596 'git call failed: [git clone' 
> https://github.com/kanaka/noVNC.git '/opt/stack/noVNC]' 
> 
> 2014-03-11 15:03:13.701 | + local exitcode=0 
> 
> 2014-03-11 15:03:13.702 | [Call Trace] 
> 
> 2014-03-11 15:03:13.703 | ./stack.sh:736:install_nova 
> 
> 2014-03-11 15:03:13.705 | /var/lib/jenkins/devstack/lib/nova:618:git_clone 
> 
> 2014-03-11 15:03:13.706 | 
> /var/lib/jenkins/devstack/functions-common:543:git_timed 
> 
> 2014-03-11 15:03:13.707 | /var/lib/jenkins/devstack/functions-common:596:die 
> 
> 2014-03-11 15:03:13.708 | [ERROR] 
> /var/lib/jenkins/devstack/functions-common:596 git call failed: [git clone 
> https://github.com/kanaka/noVNC.git /opt/stack/noVNC] 
> 
> 
> 
> 
> 
> Example 2: 
> 
> 
> 2014-03-11 14:12:58.472 | + is_service_enabled n-novnc 
> 2014-03-11 14:12:58.473 | + return 0 
> 2014-03-11 14:12:58.474 | ++ trueorfalse False 
> 2014-03-11 14:12:58.475 | + NOVNC_FROM_PACKAGE=False 
> 2014-03-11 14:12:58.476 | + '[' False = True ']' 
> 2014-03-11 14:12:58.477 | + NOVNC_WEB_DIR=/opt/stack/noVNC 
> 2014-03-11 14:12:58.478 | + git_clone https://github.com/kanaka/noVNC.git 
> /opt/stack/noVNC master 
> 2014-03-11 14:12:58.479 | + GIT_REMOTE= https://github.com/kanaka/noVNC.git 
> 2014-03-11 14:12:58.480 | + GIT_DEST=/opt/stack/noVNC 
> 2014-03-11 14:12:58.481 | + GIT_REF=master 
> 2014-03-11 14:12:58.482 | ++ trueorfalse False False 
> 2014-03-11 14:12:58.483 | + RECLONE=False 
> 2014-03-11 14:12:58.484 | + [[ False = \T\r\u\e ]] 
> 2014-03-11 14:12:58.485 | + echo master 
> 2014-03-11 14:12:58.486 | + egrep -q '^refs' 
> 2014-03-11 14:12:58.487 | + [[ ! -d /opt/stack/noVNC ]] 
> 2014-03-11 14:12:58.488 | + [[ False = \T\r\u\e ]] 
> 2014-03-11 14:12:58.489 | + git_timed clone 
> https://github.com/kanaka/noVNC.git /opt/stack/noVNC 
> 2014-03-11 14:12:58.490 | + local count=0 
> 2014-03-11 14:12:58.491 | + local timeout=0 
> 2014-03-11 14:12:58.492 | + [[ -n 0 ]] 
> 2014-03-11 14:12:58.493 | + timeout=0 
> 2014-03-11 14:12:58.494 | + timeout -s SIGINT 0 git clone 
> https://github.com/kanaka/noVNC.git /opt/stack/noVNC 
> 2014-03-11 14:12:58.495 | Cloning into '/opt/stack/noVNC'... 
> 2014-03-11 14:14:02.315 | error: The requested URL returned error: 403 while 
> accessing https://github.com/kanaka/noVNC.git/info/refs 
> 2014-03-11 14:14:02.316 | fatal: HTTP request failed 
> 2014-03-11 14:14:02.317 | + [[ 128 -ne 124 ]] 
> 2014-03-11 14:14:02.318 | + die 596 'git call failed: [git clone' 
> https://github.com/kanaka/noVNC.git '/opt/stack/noVNC]' 
> 2014-03-11 14:14:02.319 | + local exitcode=0 
> 2014-03-11 14:14:02.321 | [Call Trace] 
> 2014-03-11 14:14:02.322 | ./stack.sh:736:install_nova 
> 2014-03-11 14:14:02.323 | /var/lib/jenkins/devstack/lib/nova:618:git_clone 
> 2014-03-11 14:14:02.324 | 
> /var/lib/jenkins/devstack/functions-common:543:git_timed 
> 2014-03-11 14:14:02.326 | /var/lib/jenkins/devstack/functions-common:596:die 
> 2014-03-11 14:14:02.327 | [ERROR] 
> /var/lib/jenkins/devstack/functions-common:596 git call failed: [git clone 
> https://github.com

Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"

2014-03-10 Thread Sean Dague

So, honestly, running stack.sh / unstack.sh that many times in a row
really isn't expected to work in my experience. You should at minimum be
doing ./clean.sh to try to reset the state further.

-Sean

On 03/10/2014 03:00 PM, Dane Leblanc (leblancd) wrote:
> In my case, the base OS is 12.04 Precise.
> 
> The problem is intermittent in that it takes maybe 15 to 20 cycles of 
> unstack/stack to get it into the failure mode, but once in the failure mode, 
> it appears that tgt daemon is 100% dead-in-the-water.
> 
> -Original Message-
> From: Sean Dague [mailto:s...@dague.net] 
> Sent: Monday, March 10, 2014 1:49 PM
> To: Dane Leblanc (leblancd); openstack-infra@lists.openstack.org
> Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: 
> job failed to start"
> 
> What base OS? A change was made there recently to better handle debian 
> because we believed (possibly incorrectly) that precise actually had working 
> init scripts.
> 
> It would be interesting to understand if this was a 100% failure, or only 
> intermittent, and what base OS it was on.
> 
>   -Sean
> 
> On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:
>> I don't know if anyone can give me some troubleshooting advice with this 
>> issue.
>>
>> I'm seeing an occasional problem whereby after several DevStack 
>> unstack.sh/stack.sh cycles, the tgt daemon (tgtd) fails to start during 
>> Cinder startup.  Here's a snippet from the stack.sh log:
>>
>> 2014-03-10 07:09:45.214 | Starting Cinder
>> 2014-03-10 07:09:45.215 | + return 0
>> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf
>> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d
>> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]
>> 2014-03-10 07:09:45.219 | + is_ubuntu
>> 2014-03-10 07:09:45.220 | + [[ -z deb ]]
>> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'
>> 2014-03-10 07:09:45.222 | + sudo service tgt restart
>> 2014-03-10 07:09:45.223 | stop: Unknown instance: 
>> 2014-03-10 07:09:45.619 | start: Job failed to start 
>> jenkins@neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | + 
>> exit_trap
>> 2014-03-10 07:09:45.622 | + local r=1
>> 2014-03-10 07:09:45.623 | ++ jobs -p
>> 2014-03-10 07:09:45.624 | + jobs=
>> 2014-03-10 07:09:45.625 | + [[ -n '' ]]
>> 2014-03-10 07:09:45.626 | + exit 1
>>
>> If I try to restart tgt manually without success:
>>
>> jenkins@neutronpluginsci:~$ sudo service tgt restart
>> stop: Unknown instance: 
>> start: Job failed to start
>> jenkins@neutronpluginsci:~$ sudo tgtd
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel modules?
>> (null): fcoe_init(214) (null)
>> (null): fcoe_create_interface(171) no interface specified.
>> jenkins@neutronpluginsci:~$
>>
>> The config in /etc/tgt is:
>>
>> jenkins@neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root 
>> root 4096 Mar 10 07:03 conf.d
>> lrwxrwxrwx 1 root root   30 Mar 10 06:50 stack.d -> 
>> /opt/stack/data/cinder/volumes
>> -rw-r--r-- 1 root root   58 Mar 10 07:07 targets.conf
>> jenkins@neutronpluginsci:/etc/tgt$ cat targets.conf include 
>> /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/* 
>> jenkins@neutronpluginsci:/etc/tgt$ ls conf.d 
>> jenkins@neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes 
>> jenkins@neutronpluginsci:/etc/tgt$
>>
>> I don't know if there's any missing Cinder config in my DevStack localrc 
>> files. Here's one that I'm using:
>>
>> MYSQL_PASSWORD=nova
>> RABBIT_PASSWORD=nova
>> SERVICE_TOKEN=nova
>> SERVICE_PASSWORD=nova
>> ADMIN_PASSWORD=nova
>> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder
>> ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit
>> enable_service mysql
>> disable_service n-net
>> enable_service q-svc
>> enable_service q-agt
>> enable_service q-l3
>> enable_service q-dhcp
>> enable_service q-meta
>> enable_service q-lbaas
>> enable_service neutron
>> enable_service tempest
>> VOLUME_BACKING_FILE_SIZE=2052M
>> Q_PLUGIN=cisco
>> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A 
>> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron
>> pluginsci:1/9) 
>> NCCLIENT_REPO=git://github.com/CiscoSystems/ncclient.git
>> PHYSICAL_NETWORK=physnet1
>>

Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"

2014-03-10 Thread Sean Dague

What base OS? A change was made there recently to better handle debian
because we believed (possibly incorrectly) that precise actually had
working init scripts.

It would be interesting to understand if this was a 100% failure, or
only intermittent, and what base OS it was on.

-Sean

On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:
> I don't know if anyone can give me some troubleshooting advice with this 
> issue.
> 
> I'm seeing an occasional problem whereby after several DevStack 
> unstack.sh/stack.sh cycles, the tgt daemon (tgtd) fails to start during 
> Cinder startup.  Here's a snippet from the stack.sh log:
> 
> 2014-03-10 07:09:45.214 | Starting Cinder
> 2014-03-10 07:09:45.215 | + return 0
> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf
> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d
> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]
> 2014-03-10 07:09:45.219 | + is_ubuntu
> 2014-03-10 07:09:45.220 | + [[ -z deb ]]
> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'
> 2014-03-10 07:09:45.222 | + sudo service tgt restart
> 2014-03-10 07:09:45.223 | stop: Unknown instance: 
> 2014-03-10 07:09:45.619 | start: Job failed to start
> jenkins@neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | + exit_trap
> 2014-03-10 07:09:45.622 | + local r=1
> 2014-03-10 07:09:45.623 | ++ jobs -p
> 2014-03-10 07:09:45.624 | + jobs=
> 2014-03-10 07:09:45.625 | + [[ -n '' ]]
> 2014-03-10 07:09:45.626 | + exit 1
> 
> If I try to restart tgt manually without success:
> 
> jenkins@neutronpluginsci:~$ sudo service tgt restart
> stop: Unknown instance: 
> start: Job failed to start
> jenkins@neutronpluginsci:~$ sudo tgtd
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel modules?
> (null): fcoe_init(214) (null)
> (null): fcoe_create_interface(171) no interface specified.
> jenkins@neutronpluginsci:~$
> 
> The config in /etc/tgt is:
> 
> jenkins@neutronpluginsci:/etc/tgt$ ls -l
> total 8
> drwxr-xr-x 2 root root 4096 Mar 10 07:03 conf.d
> lrwxrwxrwx 1 root root   30 Mar 10 06:50 stack.d -> 
> /opt/stack/data/cinder/volumes
> -rw-r--r-- 1 root root   58 Mar 10 07:07 targets.conf
> jenkins@neutronpluginsci:/etc/tgt$ cat targets.conf
> include /etc/tgt/conf.d/*.conf
> include /etc/tgt/stack.d/*
> jenkins@neutronpluginsci:/etc/tgt$ ls conf.d
> jenkins@neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes
> jenkins@neutronpluginsci:/etc/tgt$ 
> 
> I don't know if there's any missing Cinder config in my DevStack localrc 
> files. Here's one that I'm using:
> 
> MYSQL_PASSWORD=nova
> RABBIT_PASSWORD=nova
> SERVICE_TOKEN=nova
> SERVICE_PASSWORD=nova
> ADMIN_PASSWORD=nova
> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit
> enable_service mysql
> disable_service n-net
> enable_service q-svc
> enable_service q-agt
> enable_service q-l3
> enable_service q-dhcp
> enable_service q-meta
> enable_service q-lbaas
> enable_service neutron
> enable_service tempest
> VOLUME_BACKING_FILE_SIZE=2052M
> Q_PLUGIN=cisco
> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus)
> declare -A 
> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutronpluginsci:1/9)
> NCCLIENT_REPO=git://github.com/CiscoSystems/ncclient.git
> PHYSICAL_NETWORK=physnet1
> OVS_PHYSICAL_BRIDGE=br-eth1
> TENANT_VLAN_RANGE=810:819
> ENABLE_TENANT_VLANS=True
> API_RATE_LIMIT=False
> VERBOSE=True
> DEBUG=True
> LOGFILE=/opt/stack/logs/stack.sh.log
> USE_SCREEN=True
> SCREEN_LOGDIR=/opt/stack/logs
> 
> Here are links to a log showing another localrc file that I use, and the 
> corresponding stack.sh log:
> 
> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_log.txt
> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_log.txt
> 
> Does anyone have any advice on how to debug this, or recover from this 
> (beyond rebooting the node)? Or am I missing any Cinder config?
> 
> Thanks in advance for any help on this!!!
> Dane
> 
> 
> 
> ___
> OpenStack-Infra mailing list
> OpenStack-Infra@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> 


-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] check pipeline triggered when comments are added

2014-03-04 Thread Sean Dague

Correct, when we did the analysis of the giant back up in Feb, a very
large number of the gate resets were caused by people pushing old
patches in that could not pass tests. People would +A patches that had
test results that were 6 weeks old.

I didn't do careful analysis here, but based on the number of patches I
manually pulled out, it wouldn't surprise me if 1/3 of the resets were
caused by this. I also sent multiple emails to the openstack-dev mailing
list, to try to change the user behavior here, and it didn't.

Because with a complicated system like OpenStack, with > 100 python
requirements, and 10 integrated projects, the reason a patch passes or
doesn't is very much related to it's interaction with the rest of the
projects and dependencies. In 6 weeks we probably took at least as many
dependency updates. And we merged 2000 patches across the rest of
OpenStack. So the universe that that patch passed 6 weeks ago is
completely invalid.

So we encoded it into the core logic. And it is directly responsible for
the increased gate stability in icehouse-3.

But all that being said, I guess I'd ask the base question: what part of
having fresh test results do you find annoying?

Because that I don't actually understand. :)

-Sean

On 03/04/2014 05:02 AM, Michael Still wrote:
> Yes. Patches with very old checks were hitting the merge pipeline,
> where their failure was very expensive. Instead, they get rechecked
> every few days to ensure the author knows when the patch has zero
> chance of merging.
> 
> Michael
> 
> On Tue, Mar 4, 2014 at 8:31 PM, Antoine Musso  wrote:
>> Hello,
>>
>> For a couple weeks I noticed the Zuul 'check' pipeline is being
>> triggered when adding a single comment on an old patch.
>>
>> I find it annoying when a patchset review takes a couple weeks or so.
>> The review would usually lead to a new patchset and I don't think the
>> intermediary checks are necessary
>>
>> I think it is caused by: https://review.openstack.org/#/c/73418/ which adds:
>>
>>  trigger:
>>gerrit:
>>  - event: comment-added
>>require-approval:
>>  - username: 'jenkins'
>>older-than: 72h
>>
>> Any reason for that?
>>
>> --
>> Antoine "hashar" Musso
>>
>> ___
>> OpenStack-Infra mailing list
>> OpenStack-Infra@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
> 
> 
> 

-- 
Sean Dague
http://dague.net

signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] anyway to create ENV variables in JJB?

2014-02-24 Thread Sean Dague

On 02/24/2014 04:46 AM, Antoine Musso wrote:
> Le 24/02/2014 02:09, Sean Dague a écrit :
>> After seeing grenade jobs fail the inner / outer timeout because it
>> turns out there are > 5 minutes of time outside the main run job, I
>> pulled together this new approach - https://review.openstack.org/#/c/75726/
>>
>> Ideally it would be nice to not have to keep adding a
>> DEVSTACK_GATE_TIMEOUT into all the jobs, but instead use the existing
>> time definition from devstack-gate.yaml
>>
>>
>>wrappers:
>>   - timeout:
>>   timeout: 130
>>   fail: true
>>
>> What would be needed would be to make that timeout value available as an
>> environmental variable some how.
>>
>> Any thoughts on how to make that happen?
> 
> Hello,
> 
> The Jenkins "Build Timeout" plugin does not support parameters and would
> fallback to a 3 minutes timeout whenever it is passed a non integer
> value :-(
> 
> Some Java guru might be able to add support to the plugin so it
> recognize build parameters properly (ex: TIMEOUT).
> 
> 
> On the JJB side, you could use a parameters in a job-template:
> 
> - job-template:
> name: 'timeout-job'
> wrappers:
>  - timeout:
>  timeout: "{timeout}"
>  fail: true
> 
> Then invoke the template filling the 'timeout' parameter, either at the
> job level:
> 
> - project:
> name: project
> jobs:
>  - timeout-job:
>  timeout: 30
> 
> Or at the project level:
> 
> - project:
> name: project
> timeout: 30
> jobs:
>  - timeout-job
> 
> In both cases you will end up with the following XML:
> 
> 
>   ...
>   
> 
>   
>   30
>   true
>   ...
> 
>   
> 
> 
> 
> cheers,

So I think that's not my question at all.

My concern is that we are specifying the timeout parameter twice in our
yaml files. Once in yaml for jenkins, and once for devstack-gate (where
we manually shave off some number of minutes that we guess is good enough).

I want to not have to specify it a second time, which would be fine if
JJB made timeout available in the Jenkins environment.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] anyway to create ENV variables in JJB?

2014-02-23 Thread Sean Dague

After seeing grenade jobs fail the inner / outer timeout because it
turns out there are > 5 minutes of time outside the main run job, I
pulled together this new approach - https://review.openstack.org/#/c/75726/

Ideally it would be nice to not have to keep adding a
DEVSTACK_GATE_TIMEOUT into all the jobs, but instead use the existing
time definition from devstack-gate.yaml


   wrappers:
  - timeout:
  timeout: 130
  fail: true

What would be needed would be to make that timeout value available as an
environmental variable some how.

Any thoughts on how to make that happen?

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [openstack-dev] [TripleO] promoting devtest_seed and devtest_undercloud to voting, + experimental queue for nova/neutron etc.

2014-02-14 Thread Sean Dague

On 02/14/2014 03:43 PM, Robert Collins wrote:
> Thanks to a massive push this week, both the seed *and* undercloud
> jobs are now passing on tripleo-gate nodes, but they are not yet
> voting.
> 
> I'd kind of like to get them voting on tripleo jobs (check only). We
> don't have 2 clouds yet, so if the tripleo ci-cloud suffers a failure,
> we'd have -1's everywhere. I think this would be an ok tradeoff (its
> check after all), but I'd like -infra admin folks opinion on this -
> would it cause operational headaches for you, over and above the
> current risks w/ the tripleo-ci cloud?
> 
> OTOH - we actually got passing ops with a fully deployed virtual cloud
> - which is awesome.
> 
> Now we need to push through to having the overcloud deploy tests pass,
> then the other scenarios we depend on - upgrades w/rebuild, and we'll
> be in good shape to start optimising (pre-heated clouds, local distro
> mirrors etc) and broadening (other distros ...).
> 
> Lastly, I'm going to propose a merge to infra/config to put our
> undercloud story (which exercises the seed's ability to deploy via
> heat with bare metal) as a check experimental job on our dependencies
> (keystone, glance, nova, neutron) - if thats ok with those projects?
> 
> -Rob
> 

My biggest concern with adding this to check experimental, is the
experimental results aren't published back until all the experimental
jobs are done.

We've seen really substantial delays, plus a 5 day complete outage a
week ago, on the tripleo cloud. I'd like to see that much more proven
before it starts to impact core projects, even in experimental.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] description of gerrit headless users

2014-02-12 Thread Sean Dague

On 02/12/2014 05:13 PM, Elizabeth Krumbach Joseph wrote:
> On Wed, Feb 12, 2014 at 2:03 PM, Jay Buffington  wrote:
>> There are a number of headless users that comment on gerrit.  A few are:
>>
>>* turbo hipster
>>* vmware minesweeper
>>* elastic recheck
>>
>> Is there a wiki page that describes what each of these do and who owns them?
>> If so it'd be great to update the comment template for each of these bots to
>> include a link to their section of that page.
> 
> Someone else can jump in with more thoughts, but currently the easiest
> way to find who owns them is look at the email address associated with
> them in the "Voting Third-Party CI" group in Gerrit:
> 
> https://review.openstack.org/#/admin/groups/91,members
> 
> This doesn't really contain details about what they do, and often
> times we're not told what they do when we set up an account.

I do like the idea of a requirement of a bot is to provide a link back
to a learn more page about it.

I think it would be educational for everyone.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Elastic Search backed up at least 2 hrs

2014-02-11 Thread Sean Dague

It looks like Elastic Search is currently backed up at least 2 hrs, as
such elastic recheck is no longer reliably commenting on failures. Any
idea what changed recently that caused this backup? This seems to at
least go back to early yesterday (though I haven't been keeping an eye
on it, so could have been happening earlier than that).

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] gerrit idempotent ids on revert?

2014-01-31 Thread Sean Dague

Question: is there any way to get gerrit to generate the idempotent ids
on a revert through the web ui?

Because today it does not. So if you want to amend the commit message it
means you end up pushing a second change.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] suggestions for gate optimizations

2014-01-20 Thread Sean Dague

On 01/19/2014 11:38 PM, Joe Gordon wrote:
> 
> 
> 
> On Sun, Jan 19, 2014 at 7:01 AM, Monty Taylor  <mailto:mord...@inaugust.com>> wrote:
> 
>     On 01/19/2014 05:38 AM, Sean Dague wrote:
> 
> So, we're currently 70 deep in the gate, top of queue went in >
> 40 hrs
> ago (probably closer to 50 or 60, but we only have enqueue time
> going
> back to the zuul restart).
> 
> I have a couple of ideas about things we should do based on what
> I've
> seen in the gate during this wedge.
> 
> = Remove reverify entirely =
> 
> 
> Yes. Screw it. In a deep queue like now, it's more generally harmful
> than good.
> 
> 
> I agree with this one, but we should also try to educate the devs,
> because in the case you brought up below it was a core dev who didn't
> examine why his patch failed and if he couldn't do reverify bug, he
> could just do +A.

Sure. My experience at this point is it will only be of mixed success.
There are tons of devs that just say F it and push stuff ahead because
they are busy.

The only way you could really fix something like that would be if +A was
a points system like TCP slow start. Which is this totally other system
you'd have to build. I think fun as a bar conversation, completely
useless in practice.

> Core reviewers can trigger a requeue with +A state changes. Reverify
> right now is exceptional dangerous in that it lets *any* user put
> something back in the gate, even if it can't pass. There are a
> ton of
> users that believe they are being helpful in doing so, and
> making things
> a ton worse. stable/havana changes being a prime instance.
> 
> If we were being prolog tricky, I'd actually like to make Jenkins -2
> changes need positive run on it before it could be reenqueued. For
> instance, I saw a swift core developer run "reverify bug 123456789"
> again on a change that couldn't pass. While -2s are mostly races
> at this
> point, the team of people that are choosing to ignore them are not
> staying up on what's going on in the queue enough to really know
> whether
> or not trying again is ok.
> 
> = Early Fail Detection =
> 
> With the tempest run now coming in north of an hour, I think we
> need to
> bump up the priority of signally up to jenkins that we're a
> failure the
> first time we see that in the subunit stream. If we fail at 30
> minutes,
> waiting for 60 until a reset is just adding far more delay.
> 
> I'm not really sure how we get started on this one, but I think
> we should.
> 
> 
> This one I think will be helpful, but it also is the one that
> includes that most deep development. Honestly, the chances of
> getting it done this week are almost none.
> 
> That said - I agree we should accelerate working on it. I have
> access to a team of folks in India with both python and java
> backgrounds - if it would be helpful and if we can break out work
> into, you know, assignable chunks, let me know.
> 
> 
> = Pep8 kick out of check =
> 
> I think on the Check Queue we should pep8 first, and not run
> other tests
> until that passes (this reverses a previous opinion I had).
> We're now
> starving nodepool. Preventing taking 5 nodepool nodes on patches
> that
> don't pep8 would be handy. When Dan pushes a 15 patch change
> that fixes
> nova-network, and patch 4 has a pep8 error, we thrash a bunch.
> 
> 
> Agree. I think this might be one of those things that goes back and
> forth on being a good or bad idea over time. I think now is a time
> when it's a good idea.
> 
> 
> 
> What about adding a pre-gate queue that makes sure pep8 and unit tests
> pass before adding a job to the gate (of course this would mean we would
> have to re-run pep8 and unit tests in the gate). Hopefully this would
> reduce the amount of gate thrashing incurred by a gate patch that fails
> one of these jobs.

So this was a check only statement. This is mostly just about saving
nodes in nodepool. Gate would remain the same.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] suggestions for gate optimizations

2014-01-19 Thread Sean Dague

On 01/19/2014 03:50 PM, Michael Still wrote:
> On Sun, Jan 19, 2014 at 11:01 PM, Monty Taylor  wrote:
>> On 01/19/2014 05:38 AM, Sean Dague wrote:
> 
> [snip]
> 
>>> = Periodic recheck on old changes =
>>>
>>> I think Michael Still said he was working on this one. Certain projects,
>>> like Glance and Keystone, tend to approve things with really stale test
>>> results (> 1 month old). These fail, and then tumble. They are a be
>>> source of the wrecking balls.
>>
>> I believe he's got it working, actually. I think the real trick with this -
>> which I whole-heartedly approve of - is not making node starvation worse.
> 
> Yes, I wrote this on Friday. I wanted to write it as a zuul
> turbo-hipster plugin, but that wasn't possible because of the way
> results are returned in a comment by zuul, so its just a stand along
> python script instead. I've been running it on and off over the
> weekend, but only while I can watch and hand verify what it does to
> build some trust.
> 
> Node starvation is an issue, but we don't seem to be too bad at the
> moment in terms of check queue depth.
> 
>>> Tests results > 1 week are clearly irrelevant. For something like nova,
>>>>
>>>> 3 days can be problematic.
> 
> I am currently triggering on comments on reviews where the jenkins run
> is more than 7 days old. If you want me to tweak the age rule I am
> more than happy to.

Honestly, 7 days is good. I just saw a python-neutronclient change in
the gate that had tests last run in Oct. So lets go with 7 days for now
to get us under control.

Is this running now?

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] suggestions for gate optimizations

2014-01-19 Thread Sean Dague

So, we're currently 70 deep in the gate, top of queue went in > 40 hrs
ago (probably closer to 50 or 60, but we only have enqueue time going
back to the zuul restart).

I have a couple of ideas about things we should do based on what I've
seen in the gate during this wedge.

= Remove reverify entirely =

Core reviewers can trigger a requeue with +A state changes. Reverify
right now is exceptional dangerous in that it lets *any* user put
something back in the gate, even if it can't pass. There are a ton of
users that believe they are being helpful in doing so, and making things
a ton worse. stable/havana changes being a prime instance.

If we were being prolog tricky, I'd actually like to make Jenkins -2
changes need positive run on it before it could be reenqueued. For
instance, I saw a swift core developer run "reverify bug 123456789"
again on a change that couldn't pass. While -2s are mostly races at this
point, the team of people that are choosing to ignore them are not
staying up on what's going on in the queue enough to really know whether
or not trying again is ok.

= Early Fail Detection =

With the tempest run now coming in north of an hour, I think we need to
bump up the priority of signally up to jenkins that we're a failure the
first time we see that in the subunit stream. If we fail at 30 minutes,
waiting for 60 until a reset is just adding far more delay.

I'm not really sure how we get started on this one, but I think we should.

= Pep8 kick out of check =

I think on the Check Queue we should pep8 first, and not run other tests
until that passes (this reverses a previous opinion I had). We're now
starving nodepool. Preventing taking 5 nodepool nodes on patches that
don't pep8 would be handy. When Dan pushes a 15 patch change that fixes
nova-network, and patch 4 has a pep8 error, we thrash a bunch.

= More aggressive kick out by zuul =

We have issues where projects have racing unit tests, which they've not
prioritized fixing. So those create wrecking balls in the gate.
Previously we've been opposed to kicking those out based on the theory
the patch ahead could be the problem (which I've actually never seen).

However this is actually fixable. We could see if there is anything
ahead of it in zuul that runs the same tests. If not, then it's not
possible that something ahead of it could fix it. This is based on the
same logic zuul uses to build the queue in the first place.

This would shed the wrecking balls earlier.

= Periodic recheck on old changes =

I think Michael Still said he was working on this one. Certain projects,
like Glance and Keystone, tend to approve things with really stale test
results (> 1 month old). These fail, and then tumble. They are a be
source of the wrecking balls.

Tests results > 1 week are clearly irrelevant. For something like nova,
> 3 days can be problematic.

I'm sure there are some other ideas, but I wanted to dump this out while
it was fresh in my brain.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] elastic-search delay metrics?

2014-01-13 Thread Sean Dague

I'm doing some fundamental refactors on the ER bot to help us try to
figure out why we are often not tagging bugs that we should be, and have
found that we're no longer really indexing in real time (which may be a
huge part of this).

Basically we've got a more or less hard timeout of 13 minutes (it's up
to 20 attemps with a 40s wait between for random historical reasons)
from gerrit fail reporting to having the console log index in ES. (We
give it another 13 minutes after that to gather all the rest of the job
appropriate logs).

Because of the way we process events, timing out on one fail often means
the next one actually might work, because you'll get 13 minutes from the
time ER looked at your change, not since your change was posted (we're
single threaded in this part of the loop).

What I'm seeing right now is that starting up the bot locally it will
always timeout waiting for results of the first failure that it gets,
then if you get lucky, it might classify the 2nd fail.

Given that, we really need to be tracking and alerting on ES delays some
how, otherwise we're going to loose a lot of the value on this.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Lost data in elastic search

2014-01-08 Thread Sean Dague

In trying to compute Elastic Recheck failure rates we basically need to 
do the following:


a = get_baseline_all_jobs - """filename:"console.html" AND 
(message:"Finished: FAILURE" OR message:"Finished: SUCCESS")"""


b = get_results_for_er_querie
a = a.groupby('build_uuid')
b = b.groupby('build_uuid')

a.join(b, on='build_uuid')

In doing so I started running into issues that I was getting far fewer 
failures after the join than I expected.


And what I discovered was that console.html was completely missing from 
the indexes for a some build_uuids.


Here is a good example: 
http://logstash.openstack.org/#eyJzZWFyY2giOiJidWlsZF91dWlkOjIxMGI3N2UzZmFhMTQ1ZGQ4ZTE0ZjNhODNiOTdmOTIyIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg5MTk1MjMzMTY5fQ==


build_uuid:210b77e3faa145dd8e14f3a83b97f922

I'm not sure what the fix is there. We could probably write an audit 
tool to figure out how bad it is.


This would also actually explain something I've noticed where I would 
have expected ER to report on a bug, but it did not, because ER actually 
waits for all expected files to land for a job (console.html is a 
required file) before it reports.


-Sean

--
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] metadata on nodepool image version?

2014-01-07 Thread Sean Dague

So looking at - https://bugs.launchpad.net/nova/+bug/1266711, it's a 
90+% fail since midnight, which means I'm going to assume something 
changed to the system at midnight which made this no good (and the few 
success after this are just old nodes that were still in the build queue).


It would be really great if there was some kind of identifier in the 
image to tell us that we were or were not on the same set of base 
software. Something we could bin against.


Like dpkg -l | sha256sum, that was then added to the metadata in elastic 
search. It wouldn't tell us exactly the differences, but it would 
demonstrate another metric about failure.


If it was put in the console log crm114 might even pick it up as likely 
reason for the failure.


    -Sean

--
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] What's between us an a gerrit upgrade?

2014-01-03 Thread Sean Dague


On 01/03/2014 12:15 PM, Antoine Musso wrote:

Le 02/01/14 16:35, Sean Dague a écrit :


For instance, Gerrit 2.5 added the ability to put UnifiedDiffs in email
templates, which would let you know if the review in question touched
files that you like to keep an eye on (and trigger local filtering based
on that).



Merlijn van Deen, a volunteer for Wikimedia, wrote a python bot that
would listen for changes and add folks as reviewers.  It even comes with
file filtering.

The configuration is done by editing a wiki page:

   https://www.mediawiki.org/wiki/Gerrit-reviewer-bot

Source code is:

  https://github.com/valhallasw/gerrit-reviewer-bot

Merlijn is on freenode as valhallasw.


Cool stuff. However, with the scale of both repos and reviewers in 
OpenStack, I don't think a global list is really going to work.


Core gerrit actually solves this now (you can watch on a regex), it's 
just about us getting there.


There are lots of ways we can work around gerrit, but the fact that 
these features are built in really means we should spend the effort to 
get there vs. deploying work arounds.


-Sean

--
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] What's between us an a gerrit upgrade?

2014-01-02 Thread Sean Dague

So as the review queues grow, and our gerrit becomes longer and longer
in the tooth, we are now getting to the point where people are spending
real time building local work arrounds for features lacking in gerrit
2.4 that have since been implemented upstream.

For instance, Gerrit 2.5 added the ability to put UnifiedDiffs in email
templates, which would let you know if the review in question touched
files that you like to keep an eye on (and trigger local filtering based
on that).

Gerrit 2.8 implements a secondary index mechanism that means we could
get file regex in the web ui for reviews.

We also apparently got custom dashboards somewhere along the way.

What's actually currently blocking the upgrade? Is it just time? or are
we still waiting on feature(s)? If so, do those feature(s) outweigh what
we are missing by being so far behind master?

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Week-end project

2013-12-19 Thread Sean Dague

On 12/19/2013 05:58 AM, Thierry Carrez wrote:
> Sean Dague wrote:
>> I think there are 2 approaches that I see being fruitful, depending on
>> the kind of problem the team is going after.
>>
>> 1) the yaml -> ical converter.
>>
>> Bulk of invention is going to be on the converter, especially
>> translating into ical recurrance rules. Also will probably want / need
>> to build an HTML UI for the end result so people can actually see it on
>> a webpage as well.
>>
>> 2) drupal + calendar + workflow
>>
>> I was actually thinking about what ttx said about no tool existing out
>> there to be able to take calendar updates into an approval queue. I
>> think you could actually build that pretty easily with drupal base site
>> (logins connected to lp openid) + calendar modules + workflow module
>> (that allows for approval queues on changes).
>>
>> Different set of things to learn (more on the drupal side), however the
>> advantages would be that a lot of the UI and ical bits would be handled
>> already.
> 
> So.. approach 2 would definitely be more friendly for non-devs, but what
> I like about option 1 (in addition to its lack of specific
> infrastructure setup) is that you can check the proposed change for
> future conflicts before it is even reviewed and at merge time, using the
> same check/gate mechanisms we have for everything else. It sounds a lot
> more difficult to set up such automated verification in the drupal case,
> so that would push the conflict validation onto the human reviewing the
> change...

That's definitely true. There would be a visual way to review, which
would be helpful.

Anyway, I'd let the team doing the implementation figure out which set
of problems they wanted to solve. Both would be massive improvements
from where we currently stand.

In my own nirvana I want a site I can go to, log in, tell it which
meetings I care about, and it builds me a custom ical feed that I can
link into my google calendar, and I'm done. Bonus if we actually got
people updating agendas in there. It feels like we could get there if we
started with a web stack that understood calendaring already. It's a ton
of work to get there from scratch. I've done calendaring up from scratch
before - https://github.com/sdague/inviter - it's all doable. It's just
way more than a weekend project, and way more about dealing with the
craziness of ical itself.

-Sean

-- 
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net

signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Week-end project

2013-12-17 Thread Sean Dague

On 12/17/2013 04:25 AM, Thierry Carrez wrote:
> Mathew R Odden wrote:
>> Not to hijack the topic, but I have a team of university students that I
>> am looking for an OpenStack related blueprint/project for them to work
>> on. I think this would be something that wouldn't be too hard for them
>> to jump into and achieve some results for the duration of their team
>> project.
>>
>> I also think it would be an extremely useful utility for managing the
>> growing amount of meetings.
> 
> I think that would be a good project for them (limited scope,
> self-contained), unless you want them to get familiar with OpenStack
> itself (rather than our development infrastructure).
> 
> They might need mentoring though, especially to understand
> gerrit/zuul/jobs which is a pretty essential part of the process (check
> jobs to check availability and file syntax, post-merge jobs to refresh
> the ICS and get it published in human-readable fashion). I can help as
> the "customer" expressing feature requests, but can't spend too much
> time mentoring.
> 
> Let us know if they grab it so that we don't spend more time on it.

I think it's a great idea.

I think there are 2 approaches that I see being fruitful, depending on
the kind of problem the team is going after.

1) the yaml -> ical converter.

Bulk of invention is going to be on the converter, especially
translating into ical recurrance rules. Also will probably want / need
to build an HTML UI for the end result so people can actually see it on
a webpage as well.

2) drupal + calendar + workflow

I was actually thinking about what ttx said about no tool existing out
there to be able to take calendar updates into an approval queue. I
think you could actually build that pretty easily with drupal base site
(logins connected to lp openid) + calendar modules + workflow module
(that allows for approval queues on changes).

Different set of things to learn (more on the drupal side), however the
advantages would be that a lot of the UI and ical bits would be handled
already.

I've got experience both ways, so hit me up on irc with questions on
either approach.

-Sean

-- 
Sean Dague
http://dague.net

signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Week-end project

2013-12-12 Thread Sean Dague

On 12/12/2013 05:04 AM, Thierry Carrez wrote:
> Chmouel Boudjnah wrote:
>> On 12 Dec 2013, at 10:21, Thierry Carrez  wrote:
>>
>>> The format of the YAML file would limit errors (all times in UTC,
>>
>> What would be the YAML format ? something like this ? [1]:
>>
>> openstack-meeting:
>>  - nova:
>>- time: 21:00
>>- occurrence: weekly
>>- day: tuesday
>>  - swift: 
>> - time:  20:00
>>  - occurrence: bi-weekly
>> - day: wedenesday
>>
>> openstack-meeting-alt:
>>  -trove:
>>  - time: 21:00
>>  - day: wedenedsay
>>  - occurrence: bi-weekly
>>
>> Chmouel
>>
>> [1] Just kicking the discussion, didn’t enrol for that (yet) :)
> 
> Looking at the meetings wiki page and what we need to recreate it,
> probably more like:
> 
> - Nova team meeting:
>   - day: Thursday
>   - time: 21:00
>   - repeats: weekly
>   - chair: Russell Bryant
> - Translations team meeting:
>   - day: Wednesday
>   - time: 00:00
>   - alternatetime: 16:00
>   - firstdate: 2013-12-10
>   - repeats: rotatingweekly
>   - channel: #openstack-meeting-alt
>   - wiki: Meetings/Translations

Honestly, translating to ical recurrence relationships is going to be
wonky enough (I used to maintain the ruby icalendar gem, and have done a
bunch of fixing of the drupal implementation), I'd just say do this in
ical native with a validator.

ical is ugly, but it's mostly understandable (and I'd be happy to mentor
on it). For the intents and purposes for this it would probably be fine.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] All clear on novaclient ui changes

2013-11-04 Thread Sean Dague

It came up at dinner tonight that folks were sitting on novaclient ui
changes for fear they might break the gate (as happened last week). As
of last Thursday we managed to land the right changes to the
devstack-gate-wrap.sh which prevent that from happening, as demonstrated
with: https://review.openstack.org/#/c/54448/ (the Revert of the Revert
of the page that wedged us).

How to land a breaking UI change?


In order to land this change, it needs to pass devstack exercises in
grizzly, which basically means fixing this line -
https://github.com/openstack-dev/devstack/blob/stable/grizzly/exercises/aggregates.sh#L101
to be more permissive for the new UI format.  (It's L104 in master, and
probably should be synced master, havana, & grizzly).

Once that is done, the change could go forward, and should pass grenade.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Log storage/serving

2013-10-15 Thread Sean Dague

wire compressed, automatically,
based on your client's ability to handle the compression.

2) Dynamic Filtering - we added the level= parameter to the wsgi script
to speed up logstash indexing, as it turns out that python is infinitely
faster at throwing away DEBUG lines than logstash was. It turns out
people love it to, because
http://logs.openstack.org/70/44670/3/gate/gate-tempest-devstack-vm-neutron/02d68e3/logs/screen-q-svc.txt.gz?level=TRACE
loads super quick, and lets you see where top issues are.

There are a few other interesting facts that we discovered in this
process - n-cpu on a nova-network run comes in at about 5 MB gzipped (40
MB uncompresssed) of html once we do our filtering on it. If you are
running a browser other than Chrome on a nice Intel chip, life isn't
good. A future enhancement here is to be nicer to people and disable
DEBUG by default if the file size is too big.

A 40 MB html file means that client side filtering would be problematic.
First off, you need to take a huge network hit anyway, secondly I expect
the DOM manipulation at that level of complexity would even give Chrome
a run for it's money.

And then there is just the nice idea of keeping the raw artifact and the
presentation layer separate. The fact that we can update our
presentation filter, and logs from last week, which we are still using
to debug issues, are easier to ready, is a good thing.

So regardless of the eventual solution here, I *really* want the ability
to have a presentation layer filter between the raw logs and the
clients. HTTP has so many nice features of negotiation worked into the
spec, which we're actually using today, and makes life easier for folks.
And I'd really like to not loose that.

So 2a has strong vote from me.

-Sean

--
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Speeding up storyboard (was Re: On being an OpenID consumer instead of an OpenID producer.)

2013-09-25 Thread Sean Dague


On 09/25/2013 05:44 AM, Thierry Carrez wrote:

Stefano Maffulli wrote:

On 09/24/2013 04:47 PM, Monty Taylor wrote:

To be clear, I think it's a TERRIBLE idea to move half-way while we
still require launchpad ids for bugs.


Let's hurry up with Storyboard then: the Foundation is setting the
budget for 2014, although I wonder if we can outsource that
development... ideas?


I don't think we should "outsource" it, since managing that outsourcing
so that it produces the desired results would be a waste of time for the
key people involved (compared to openly developing it).

I would rather use the money (if any) to fund some positions that would
free some time from the people that already work on it.


+1

Storyboard, to be successful, means evolving in the spirit of the rest 
of the infrastructure we've built in OpenStack. Building a community 
around it along the way. "Outsource" is the wrong model.


-Sean

--
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] Smokestack posting +/-2 votes

2013-08-10 Thread Sean Dague

erhaps that job was before its time though and at the time the community 
> seem violently against using packages of any sort in upstream. Two years 
> later: if there is an open door here I'd like to pursue it... it is a lot of 
> work though and I certainly think there is a good bit to learn from the 
> workflow we use in SmokeStack now. In the meantime there is still a good bit 
> of ground to hold with SmokeStack which isn't easily swapped out.
>
> So this is encouraging... and we have lots of work to do. :)
>
>
>>
>> The problem, as I understand it, is that we can't run Xen nested on any
>> of our current cloud providers.  If there is no way to do that, then I
>> believe we would need a new source of test resources for this.  I think
>> we talked about this a bit at the summit, but I think in general if
>> someone wants to provide new testing resources, they would need to be
>> sufficient to support our resource usage (which is large and growing),
>> supported by a real ops team (because we aren't one) with something like
>> an SLA.  We discussed some ideas around bare metal testing at the summit
>> here: https://etherpad.openstack.org/havana-bare-metal-testing
>>
>> Even without gating, the advisory reporting from smokestack is hugely
>> valuable.  I think any increase in reliability and speed (such as
>> automating the failure detection as you were talking about) will be
>> perceived by the review community and they will act accordingly, giving
>> it significant weight in their reviews.
>
> Exactly.
>
>>
>> -Jim
>>
>> ___
>> OpenStack-Infra mailing list
>> OpenStack-Infra@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>
>
> ___
> OpenStack-Infra mailing list
> OpenStack-Infra@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra



-- 
Sean Dague
http://dague.net

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] [openstack-qa] Full Tempest in Gate review

2013-02-01 Thread Sean Dague

Can we pin it down to a particular one of the clouds? I know there was 
concern the rackspace cloud was extra slow here, so it would be good to 
figure out where this got monumentally worse.


-Sean

On 02/01/2013 10:42 AM, David Kranz wrote:

Sean, thanks for pushing this through. I am a little concerned that the
extra time this adds to all project gates is more than was advertised in
the checkin comment. Here are the "full tempest" times for the last 25
hourly runs. As you can see,
the added time varies from 22 minutes to an hour and 4 minutes.

  -David

log1900:2013-01-31 14:12:26.617 | Ran 567 tests in 2926.770s
log1901:2013-01-31 14:37:31.720 | Ran 567 tests in 1320.037s
log1902:2013-01-31 15:46:24.235 | Ran 567 tests in 1611.961s
log1903:2013-01-31 16:38:59.399 | Ran 567 tests in 1345.720s
log1904:2013-01-31 17:37:41.749 | Ran 567 tests in 1323.860s
log1905:2013-01-31 19:32:22.820 | Ran 567 tests in 3755.988s
log1906:2013-01-31 20:03:32.521 | Ran 567 tests in 2439.092s
log1907:2013-01-31 21:06:56.867 | Ran 567 tests in 2573.447s
log1908:2013-01-31 22:24:21.428 | Ran 567 tests in 3441.703s
log1909:2013-01-31 23:08:37.103 | Ran 567 tests in 2777.078s
log1910:2013-01-31 23:46:42.917 | Ran 567 tests in 1606.375s
log1911:2013-02-01 01:08:03.059 | Ran 567 tests in 2735.490s
log1912:2013-02-01 02:09:38.766 | Ran 567 tests in 2782.098s
log1913:2013-02-01 03:32:21.960 | Ran 567 tests in 3758.148s
log1914:2013-02-01 04:27:28.724 | Ran 567 tests in 3564.568s
log1915:2013-02-01 05:08:08.436 | Ran 567 tests in 2739.774s
log1916:2013-02-01 05:45:41.397 | Ran 567 tests in 1609.954s
log1917:2013-02-01 07:29:19.243 | Ran 567 tests in 3651.127s
log1918:2013-02-01 08:10:40.330 | Ran 567 tests in 2827.667s
log1919:2013-02-01 08:40:55.020 | Ran 567 tests in 1350.462s
log1920:2013-02-01 10:28:20.122 | Ran 567 tests in 3600.405s
log1921:2013-02-01 11:05:13.699 | Ran 567 tests in 2490.120s
log1922:2013-02-01 12:34:33.818 | Ran 567 tests in 3883.574s
log1923:2013-02-01 12:40:23.788 | Ran 567 tests in 1372.075s


___
openstack-qa mailing list
openstack...@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-qa




--
Sean Dague
IBM Linux Technology Center
email: sda...@linux.vnet.ibm.com
alt-email: slda...@us.ibm.com


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

[OpenStack-Infra] Full Tempest in Gate review

2013-01-29 Thread Sean Dague


This is up for review now - https://review.openstack.org/#/c/20762/

And would turn on full tempest in the gate assuming all the affected 
PTLs sign off on it, feedback welcomed.


-Sean

--
Sean Dague
IBM Linux Technology Center
email: sda...@linux.vnet.ibm.com
alt-email: slda...@us.ibm.com


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

60 matches

Mail list logo