Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-09-01 Thread Emilien Macchi
On Fri, Sep 1, 2017 at 8:27 AM, Marios Andreou  wrote:
> I don't think this ^^ patch is what causes/ed that bug, at least I haven't
> found or seen that this is the case. As we discussed a bit on Wednesday
> evening, if you see the elastic-recheck query @
> http://status.openstack.org/elastic-recheck/#1713832 the issue was seen
> twice in the last 24 hours and 5 times since the revert landed early
> Wednesday EU morning. I think that's roughly the same rate it was being seen
> before and after my patch was reverted.
>
> Do you think we can now consider landing that again with
> https://review.openstack.org/#/c/499116/ please ?

At that point I would say no.
We have been merging so many stuffs lately because of release that we
don't even know why this thing broke.
Is it in Zaqar? In Swift? In TripleO?

So yeah maybe it's not your patch but it's maybe related or maybe not.
Since nobody is able to tell it, I would suggest to not revert the
revert until we find the root cause and we fix it.
Because if we don't do that, we'll merge your patch again and
eventually increase the number of hits in the gate which is something
we don't want at this stage.

When we discussed you told me this patch wasn't a requirement to
upgrade but an enhancement. At this stage of the cycle we are looking
for stability and not for enhancements, I hope you understand that.
It's a difficult choice to make for me but I'll prefer us to fix bugs
before we continue to merge new features from now.

Keep in mind feature freeze and why we're doing this.

> I also see you assigned me that bug - I'll try and have another look at it
> next week but we may want to reach out to someone from zaqar see if they
> have any ideas about why that happens.

Thank you, that would be very helpful.
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-09-01 Thread Marios Andreou
On Wed, Aug 30, 2017 at 2:17 AM, Emilien Macchi  wrote:

> We are currently dealing with 4 issues and until they are fix, please
> do not approve any patch. We want to keep the gate clear to merge the
> fixes for the 4 problems first.
>
> 1) devstack-gate broke us because we use it as a library (bad)
> https://bugs.launchpad.net/tripleo/+bug/1713868
>
> 2) https://review.openstack.org/#/c/474578/ broke us and we're
> reverting it https://bugs.launchpad.net/tripleo/+bug/1713832
>
>
I don't think this ^^ patch is what causes/ed that bug, at least I haven't
found or seen that this is the case. As we discussed a bit on Wednesday
evening, if you see the elastic-recheck query @
http://status.openstack.org/elastic-recheck/#1713832 the issue was seen
twice in the last 24 hours and 5 times since the revert landed early
Wednesday EU morning. I think that's roughly the same rate it was being
seen before and after my patch was reverted.

Do you think we can now consider landing that again with
https://review.openstack.org/#/c/499116/ please ?

I also see you assigned me that bug - I'll try and have another look at it
next week but we may want to reach out to someone from zaqar see if they
have any ideas about why that happens.

have a good weekend all

marios



> 3) We shouldn't build images on multinode jobs
> https://bugs.launchpad.net/tripleo/+bug/1713167
>
> 4) We should use pip instead of git for delorean
> https://bugs.launchpad.net/tripleo/+bug/1708832
>
>
> Until further notice from Alex or myself, please do not approve any patch.
>
> Thanks,
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-31 Thread Jiří Stránský

On 30.8.2017 06:54, Emilien Macchi wrote:

On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi  wrote:

We are currently dealing with 4 issues and until they are fix, please
do not approve any patch. We want to keep the gate clear to merge the
fixes for the 4 problems first.

1) devstack-gate broke us because we use it as a library (bad)
https://bugs.launchpad.net/tripleo/+bug/1713868

2) https://review.openstack.org/#/c/474578/ broke us and we're
reverting it https://bugs.launchpad.net/tripleo/+bug/1713832

3) We shouldn't build images on multinode jobs
https://bugs.launchpad.net/tripleo/+bug/1713167

4) We should use pip instead of git for delorean
https://bugs.launchpad.net/tripleo/+bug/1708832


Until further notice from Alex or myself, please do not approve any patch.


The 4 problems have been mitigated.
You can now proceed to normal review.

Please do not recheck a patch without an elastic-recheck comment, we
need to track all issues related to CI from now.
Paul Belanger has been doing extremely useful work to help us, now
let's use elastic-recheck more and stop blind rechecks.
All known issues are in http://status.openstack.org/elastic-recheck/
If one is missing, you're welcome to contribute by sending a patch to
elastic-recheck. Example with https://review.openstack.org/#/c/498954/


Posted DLRN build failure query [1]. I used the Kibana interface [2] to 
test-drive the query.


I wanted to tackle other bugs but it seems we don't have enough info in 
console.html. I wonder if it's realistic to start pulling some logs 
maybe from undercloud/home/jenkins dir into logstash? That's where OOOQ 
puts the most of its more detailed output, so having that might allow us 
to produce more specific queries.


Thanks,

Jirka

[1] https://review.openstack.org/499532
[2] http://logstash.openstack.org



I've restored all patches that were killed from the gate and did
recheck already, hopefully we can get some merges and finish this
release.

Thanks Paul and all Infra for their consistent help!




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-31 Thread Michele Baldessari
On Thu, Aug 31, 2017 at 10:55:34AM +0200, Bogdan Dobrelya wrote:
> On 31.08.2017 10:33, Michele Baldessari wrote:
> > On Wed, Aug 30, 2017 at 11:31:14AM +0200, Bogdan Dobrelya wrote:
> >> On 30.08.2017 6:54, Emilien Macchi wrote:
> >>> On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi  
> >>> wrote:
>  We are currently dealing with 4 issues and until they are fix, please
>  do not approve any patch. We want to keep the gate clear to merge the
>  fixes for the 4 problems first.
> 
>  1) devstack-gate broke us because we use it as a library (bad)
>  https://bugs.launchpad.net/tripleo/+bug/1713868
> 
>  2) https://review.openstack.org/#/c/474578/ broke us and we're
>  reverting it https://bugs.launchpad.net/tripleo/+bug/1713832
> 
>  3) We shouldn't build images on multinode jobs
>  https://bugs.launchpad.net/tripleo/+bug/1713167
> 
>  4) We should use pip instead of git for delorean
>  https://bugs.launchpad.net/tripleo/+bug/1708832
> 
> 
>  Until further notice from Alex or myself, please do not approve any 
>  patch.
> >>>
> >>> The 4 problems have been mitigated.
> >>> You can now proceed to normal review.
> >>>
> >>> Please do not recheck a patch without an elastic-recheck comment, we
> >>> need to track all issues related to CI from now.
> >>> Paul Belanger has been doing extremely useful work to help us, now
> >>> let's use elastic-recheck more and stop blind rechecks.
> >>> All known issues are in http://status.openstack.org/elastic-recheck/
> >>> If one is missing, you're welcome to contribute by sending a patch to
> >>> elastic-recheck. Example with https://review.openstack.org/#/c/498954/
> >>
> >> That's a great example! Let me follow up on that and share my beginner's
> >> experience as well.
> >>
> >> Let's help with improving elastic-recheck queries to identify those
> >> unknown or new failures, this is really important. This also trains
> >> domain knowledge for particular areas, either openstack or *-infra, or
> >> tripleo specific.
> >>
> >> As beginners, we could start with watching for failing tripleo-ci
> >> periodic [0],[1] (available as RSS feeds) and gate jobs without e-r
> >> comments, also from that page [2].
> >>
> >> Then fetching the logs locally with tools like getthelogs [3], or
> >> looking into the logs.openstack.org directly, if advanced beginners wish 
> >> so.
> >>
> >> Finally, identifying discovered (just do some grep, like I do with my
> >> tool [4]) errorish patterns and helping with root cause analysis. And,
> >> ideally, submitting new e-r queries (see also [5]) and corresponding lp
> >> bugs. And absolutely ideally, help with addressing those as well. This
> >> might be hard though as we may be not experts in some of the areas. Some
> >> of the error messages would literally mean nothing to us. For me, the
> >> most  But as the best effort, we could invite the right persons to
> >> look into that, or at least ask folks on #tripleo or #openstack-infra.
> >>
> >> [0]
> >> http://status.openstack.org/openstack-health/#/g/project/openstack-infra~2Ftripleo-ci
> >> [1]
> >> http://status.openstack.org/openstack-health/#/g/project/openstack~2Ftripleo-quickstart
> >> [2] http://status.openstack.org/elastic-recheck/data/others.html
> >> [3] https://review.openstack.org/#/c/492178/
> >> [4] 
> >> https://github.com/bogdando/fuel-log-parse/blob/master/fuel-log-parse.sh
> >> [5]
> >> https://docs.openstack.org/infra/elastic-recheck/readme.html#running-queries-locally
> > 
> > Thanks Bogdan, this is very helpful. Do we have some docs/readme on [5].
> > It is failing here with a bunch of 404, so I presume I am missing a
> > proper elasticRecheck.conf file or some other settings?
> > 
> > I was basically trying to validate https://review.openstack.org/#/c/499516/ 
> > before
> > submitting it.
> 
> There is install docs [0] :)
> Although for my case, the following worked as well (given
> VENV=${HOME}/.virtualenvs):
> 
> $ mkvirtualenv erqtest
> $ pip install -r requirements.txt
> $ python setup.py develop
> $ ${VENV}/erqtest/bin/elastic-recheck-query queries/foo.yaml
> 
> [0] https://docs.openstack.org/infra/elastic-recheck/installation.html

:) I had gotten that far.

the usual python setup.py build + install in a new venv leaves me still with 
404s all
over:
$ elastic-recheck-query queries/1713832.yaml 
2017-08-31 11:33:04  DEBUG[urllib3.util.retry] Converted retries value: 
False -> Retry(total=False, connect=None, read=None, redirect=0)
2017-08-31 11:33:04  WARNING  [elasticsearch  ] GET 
/logstash-2017.08.31/_status [status:404 request:0.567s]
2017-08-31 11:33:04  DEBUG[elasticsearch  ] > 
2017-08-31 11:33:04  DEBUG[elasticsearch  ] <   404 Not Found 

Not Found The requested URL /logstash-2017.08.31/_status was not 
found on this server. 
Apache/2.4.7 (Ubuntu) Server at logstash.openstack.org Port 
80 

Whereas the develop command fails me on:
Processing dependencies 

Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-31 Thread Bogdan Dobrelya
On 31.08.2017 10:33, Michele Baldessari wrote:
> On Wed, Aug 30, 2017 at 11:31:14AM +0200, Bogdan Dobrelya wrote:
>> On 30.08.2017 6:54, Emilien Macchi wrote:
>>> On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi  wrote:
 We are currently dealing with 4 issues and until they are fix, please
 do not approve any patch. We want to keep the gate clear to merge the
 fixes for the 4 problems first.

 1) devstack-gate broke us because we use it as a library (bad)
 https://bugs.launchpad.net/tripleo/+bug/1713868

 2) https://review.openstack.org/#/c/474578/ broke us and we're
 reverting it https://bugs.launchpad.net/tripleo/+bug/1713832

 3) We shouldn't build images on multinode jobs
 https://bugs.launchpad.net/tripleo/+bug/1713167

 4) We should use pip instead of git for delorean
 https://bugs.launchpad.net/tripleo/+bug/1708832


 Until further notice from Alex or myself, please do not approve any patch.
>>>
>>> The 4 problems have been mitigated.
>>> You can now proceed to normal review.
>>>
>>> Please do not recheck a patch without an elastic-recheck comment, we
>>> need to track all issues related to CI from now.
>>> Paul Belanger has been doing extremely useful work to help us, now
>>> let's use elastic-recheck more and stop blind rechecks.
>>> All known issues are in http://status.openstack.org/elastic-recheck/
>>> If one is missing, you're welcome to contribute by sending a patch to
>>> elastic-recheck. Example with https://review.openstack.org/#/c/498954/
>>
>> That's a great example! Let me follow up on that and share my beginner's
>> experience as well.
>>
>> Let's help with improving elastic-recheck queries to identify those
>> unknown or new failures, this is really important. This also trains
>> domain knowledge for particular areas, either openstack or *-infra, or
>> tripleo specific.
>>
>> As beginners, we could start with watching for failing tripleo-ci
>> periodic [0],[1] (available as RSS feeds) and gate jobs without e-r
>> comments, also from that page [2].
>>
>> Then fetching the logs locally with tools like getthelogs [3], or
>> looking into the logs.openstack.org directly, if advanced beginners wish so.
>>
>> Finally, identifying discovered (just do some grep, like I do with my
>> tool [4]) errorish patterns and helping with root cause analysis. And,
>> ideally, submitting new e-r queries (see also [5]) and corresponding lp
>> bugs. And absolutely ideally, help with addressing those as well. This
>> might be hard though as we may be not experts in some of the areas. Some
>> of the error messages would literally mean nothing to us. For me, the
>> most  But as the best effort, we could invite the right persons to
>> look into that, or at least ask folks on #tripleo or #openstack-infra.
>>
>> [0]
>> http://status.openstack.org/openstack-health/#/g/project/openstack-infra~2Ftripleo-ci
>> [1]
>> http://status.openstack.org/openstack-health/#/g/project/openstack~2Ftripleo-quickstart
>> [2] http://status.openstack.org/elastic-recheck/data/others.html
>> [3] https://review.openstack.org/#/c/492178/
>> [4] https://github.com/bogdando/fuel-log-parse/blob/master/fuel-log-parse.sh
>> [5]
>> https://docs.openstack.org/infra/elastic-recheck/readme.html#running-queries-locally
> 
> Thanks Bogdan, this is very helpful. Do we have some docs/readme on [5].
> It is failing here with a bunch of 404, so I presume I am missing a
> proper elasticRecheck.conf file or some other settings?
> 
> I was basically trying to validate https://review.openstack.org/#/c/499516/ 
> before
> submitting it.

There is install docs [0] :)
Although for my case, the following worked as well (given
VENV=${HOME}/.virtualenvs):

$ mkvirtualenv erqtest
$ pip install -r requirements.txt
$ python setup.py develop
$ ${VENV}/erqtest/bin/elastic-recheck-query queries/foo.yaml

[0] https://docs.openstack.org/infra/elastic-recheck/installation.html

> 
> Thanks,
> Michele
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-31 Thread Michele Baldessari
On Wed, Aug 30, 2017 at 11:31:14AM +0200, Bogdan Dobrelya wrote:
> On 30.08.2017 6:54, Emilien Macchi wrote:
> > On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi  wrote:
> >> We are currently dealing with 4 issues and until they are fix, please
> >> do not approve any patch. We want to keep the gate clear to merge the
> >> fixes for the 4 problems first.
> >>
> >> 1) devstack-gate broke us because we use it as a library (bad)
> >> https://bugs.launchpad.net/tripleo/+bug/1713868
> >>
> >> 2) https://review.openstack.org/#/c/474578/ broke us and we're
> >> reverting it https://bugs.launchpad.net/tripleo/+bug/1713832
> >>
> >> 3) We shouldn't build images on multinode jobs
> >> https://bugs.launchpad.net/tripleo/+bug/1713167
> >>
> >> 4) We should use pip instead of git for delorean
> >> https://bugs.launchpad.net/tripleo/+bug/1708832
> >>
> >>
> >> Until further notice from Alex or myself, please do not approve any patch.
> > 
> > The 4 problems have been mitigated.
> > You can now proceed to normal review.
> > 
> > Please do not recheck a patch without an elastic-recheck comment, we
> > need to track all issues related to CI from now.
> > Paul Belanger has been doing extremely useful work to help us, now
> > let's use elastic-recheck more and stop blind rechecks.
> > All known issues are in http://status.openstack.org/elastic-recheck/
> > If one is missing, you're welcome to contribute by sending a patch to
> > elastic-recheck. Example with https://review.openstack.org/#/c/498954/
> 
> That's a great example! Let me follow up on that and share my beginner's
> experience as well.
> 
> Let's help with improving elastic-recheck queries to identify those
> unknown or new failures, this is really important. This also trains
> domain knowledge for particular areas, either openstack or *-infra, or
> tripleo specific.
> 
> As beginners, we could start with watching for failing tripleo-ci
> periodic [0],[1] (available as RSS feeds) and gate jobs without e-r
> comments, also from that page [2].
> 
> Then fetching the logs locally with tools like getthelogs [3], or
> looking into the logs.openstack.org directly, if advanced beginners wish so.
> 
> Finally, identifying discovered (just do some grep, like I do with my
> tool [4]) errorish patterns and helping with root cause analysis. And,
> ideally, submitting new e-r queries (see also [5]) and corresponding lp
> bugs. And absolutely ideally, help with addressing those as well. This
> might be hard though as we may be not experts in some of the areas. Some
> of the error messages would literally mean nothing to us. For me, the
> most  But as the best effort, we could invite the right persons to
> look into that, or at least ask folks on #tripleo or #openstack-infra.
> 
> [0]
> http://status.openstack.org/openstack-health/#/g/project/openstack-infra~2Ftripleo-ci
> [1]
> http://status.openstack.org/openstack-health/#/g/project/openstack~2Ftripleo-quickstart
> [2] http://status.openstack.org/elastic-recheck/data/others.html
> [3] https://review.openstack.org/#/c/492178/
> [4] https://github.com/bogdando/fuel-log-parse/blob/master/fuel-log-parse.sh
> [5]
> https://docs.openstack.org/infra/elastic-recheck/readme.html#running-queries-locally

Thanks Bogdan, this is very helpful. Do we have some docs/readme on [5].
It is failing here with a bunch of 404, so I presume I am missing a
proper elasticRecheck.conf file or some other settings?

I was basically trying to validate https://review.openstack.org/#/c/499516/ 
before
submitting it.

Thanks,
Michele
-- 
Michele Baldessari
C2A5 9DA3 9961 4FFB E01B  D0BC DDD4 DCCB 7515 5C6D

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-30 Thread Jeremy Stanley
On 2017-08-30 08:33:09 -0400 (-0400), Paul Belanger wrote:
[...]
> Regarding Bug 1713832 - Object PUT failed for
> zaqar_subscription[1], which was reverted last night. That is a
> great example to showcase elastic-recheck, basically if you look
> back at the logstash queries, you can see the signs pointing to an
> issue, but unfortunatly wasn't picked up until yesterday.
> 
> The info above from Bogdan is great, the general idea is, if a job
> fails in the check pipeline and elastic-recheck doesn't leave a
> comment, it is likely a new failure. Moving forward, we need to
> keep the blind rechecks to a minimum, as each time we do so, we
> have the potential for breaking the gate down the road.
[...]

And while it doesn't seem to have been exactly the case for this
one, we've seen plenty of examples in the past where people (on
various teams) have blindly rechecked a failing change to get it to
merge... and then when you later classify the bug and track it back
you find that it hit several times in the check pipeline on the very
change which introduced the problem to begin with. Point being if
you recheck without knowing why/what broke, it's increasingly likely
you'll break things for everyone else too and turn a few moments of
inconvenience for yourself into a week or more of pain for many
others.
-- 
Jeremy Stanley


signature.asc
Description: Digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-30 Thread Paul Belanger
On Wed, Aug 30, 2017 at 11:31:14AM +0200, Bogdan Dobrelya wrote:
> On 30.08.2017 6:54, Emilien Macchi wrote:
> > On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi  wrote:
> >> We are currently dealing with 4 issues and until they are fix, please
> >> do not approve any patch. We want to keep the gate clear to merge the
> >> fixes for the 4 problems first.
> >>
> >> 1) devstack-gate broke us because we use it as a library (bad)
> >> https://bugs.launchpad.net/tripleo/+bug/1713868
> >>
> >> 2) https://review.openstack.org/#/c/474578/ broke us and we're
> >> reverting it https://bugs.launchpad.net/tripleo/+bug/1713832
> >>
> >> 3) We shouldn't build images on multinode jobs
> >> https://bugs.launchpad.net/tripleo/+bug/1713167
> >>
> >> 4) We should use pip instead of git for delorean
> >> https://bugs.launchpad.net/tripleo/+bug/1708832
> >>
> >>
> >> Until further notice from Alex or myself, please do not approve any patch.
> > 
> > The 4 problems have been mitigated.
> > You can now proceed to normal review.
> > 
> > Please do not recheck a patch without an elastic-recheck comment, we
> > need to track all issues related to CI from now.
> > Paul Belanger has been doing extremely useful work to help us, now
> > let's use elastic-recheck more and stop blind rechecks.
> > All known issues are in http://status.openstack.org/elastic-recheck/
> > If one is missing, you're welcome to contribute by sending a patch to
> > elastic-recheck. Example with https://review.openstack.org/#/c/498954/
> 
> That's a great example! Let me follow up on that and share my beginner's
> experience as well.
> 
> Let's help with improving elastic-recheck queries to identify those
> unknown or new failures, this is really important. This also trains
> domain knowledge for particular areas, either openstack or *-infra, or
> tripleo specific.
> 
> As beginners, we could start with watching for failing tripleo-ci
> periodic [0],[1] (available as RSS feeds) and gate jobs without e-r
> comments, also from that page [2].
> 
> Then fetching the logs locally with tools like getthelogs [3], or
> looking into the logs.openstack.org directly, if advanced beginners wish so.
> 
> Finally, identifying discovered (just do some grep, like I do with my
> tool [4]) errorish patterns and helping with root cause analysis. And,
> ideally, submitting new e-r queries (see also [5]) and corresponding lp
> bugs. And absolutely ideally, help with addressing those as well. This
> might be hard though as we may be not experts in some of the areas. Some
> of the error messages would literally mean nothing to us. For me, the
> most  But as the best effort, we could invite the right persons to
> look into that, or at least ask folks on #tripleo or #openstack-infra.
> 
> [0]
> http://status.openstack.org/openstack-health/#/g/project/openstack-infra~2Ftripleo-ci
> [1]
> http://status.openstack.org/openstack-health/#/g/project/openstack~2Ftripleo-quickstart
> [2] http://status.openstack.org/elastic-recheck/data/others.html
> [3] https://review.openstack.org/#/c/492178/
> [4] https://github.com/bogdando/fuel-log-parse/blob/master/fuel-log-parse.sh
> [5]
> https://docs.openstack.org/infra/elastic-recheck/readme.html#running-queries-locally
> 
> > 
> > I've restored all patches that were killed from the gate and did
> > recheck already, hopefully we can get some merges and finish this
> > release.
> > 
> > Thanks Paul and all Infra for their consistent help!
> > 
> 
Indeed, this look much better this morning! Thanks to everybody on jumping on
the fixes.

Regarding Bug 1713832 - Object PUT failed for zaqar_subscription[1], which was
reverted last night. That is a great example to showcase elastic-recheck,
basically if you look back at the logstash queries, you can see the signs
pointing to an issue, but unfortunatly wasn't picked up until yesterday.

The info above from Bogdan is great, the general idea is, if a job fails in the
check pipeline and elastic-recheck doesn't leave a comment, it is likely a new
failure. Moving forward, we need to keep the blind rechecks to a minimum, as
each time we do so, we have the potential for breaking the gate down the road.

This is why you see tripleo pushing upwards of 16hr+ jobs on status.o.o/zuul,
because there was a job failure, and we had to rerun all patches again.

Keep up the good work, and look forward to talking more about this at PTG.

[1] http://status.openstack.org/elastic-recheck/#1713832

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-30 Thread Bogdan Dobrelya
On 30.08.2017 6:54, Emilien Macchi wrote:
> On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi  wrote:
>> We are currently dealing with 4 issues and until they are fix, please
>> do not approve any patch. We want to keep the gate clear to merge the
>> fixes for the 4 problems first.
>>
>> 1) devstack-gate broke us because we use it as a library (bad)
>> https://bugs.launchpad.net/tripleo/+bug/1713868
>>
>> 2) https://review.openstack.org/#/c/474578/ broke us and we're
>> reverting it https://bugs.launchpad.net/tripleo/+bug/1713832
>>
>> 3) We shouldn't build images on multinode jobs
>> https://bugs.launchpad.net/tripleo/+bug/1713167
>>
>> 4) We should use pip instead of git for delorean
>> https://bugs.launchpad.net/tripleo/+bug/1708832
>>
>>
>> Until further notice from Alex or myself, please do not approve any patch.
> 
> The 4 problems have been mitigated.
> You can now proceed to normal review.
> 
> Please do not recheck a patch without an elastic-recheck comment, we
> need to track all issues related to CI from now.
> Paul Belanger has been doing extremely useful work to help us, now
> let's use elastic-recheck more and stop blind rechecks.
> All known issues are in http://status.openstack.org/elastic-recheck/
> If one is missing, you're welcome to contribute by sending a patch to
> elastic-recheck. Example with https://review.openstack.org/#/c/498954/

That's a great example! Let me follow up on that and share my beginner's
experience as well.

Let's help with improving elastic-recheck queries to identify those
unknown or new failures, this is really important. This also trains
domain knowledge for particular areas, either openstack or *-infra, or
tripleo specific.

As beginners, we could start with watching for failing tripleo-ci
periodic [0],[1] (available as RSS feeds) and gate jobs without e-r
comments, also from that page [2].

Then fetching the logs locally with tools like getthelogs [3], or
looking into the logs.openstack.org directly, if advanced beginners wish so.

Finally, identifying discovered (just do some grep, like I do with my
tool [4]) errorish patterns and helping with root cause analysis. And,
ideally, submitting new e-r queries (see also [5]) and corresponding lp
bugs. And absolutely ideally, help with addressing those as well. This
might be hard though as we may be not experts in some of the areas. Some
of the error messages would literally mean nothing to us. For me, the
most  But as the best effort, we could invite the right persons to
look into that, or at least ask folks on #tripleo or #openstack-infra.

[0]
http://status.openstack.org/openstack-health/#/g/project/openstack-infra~2Ftripleo-ci
[1]
http://status.openstack.org/openstack-health/#/g/project/openstack~2Ftripleo-quickstart
[2] http://status.openstack.org/elastic-recheck/data/others.html
[3] https://review.openstack.org/#/c/492178/
[4] https://github.com/bogdando/fuel-log-parse/blob/master/fuel-log-parse.sh
[5]
https://docs.openstack.org/infra/elastic-recheck/readme.html#running-queries-locally

> 
> I've restored all patches that were killed from the gate and did
> recheck already, hopefully we can get some merges and finish this
> release.
> 
> Thanks Paul and all Infra for their consistent help!
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-30 Thread Marios Andreou
On Wed, Aug 30, 2017 at 7:54 AM, Emilien Macchi  wrote:

> On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi 
> wrote:
> > We are currently dealing with 4 issues and until they are fix, please
> > do not approve any patch. We want to keep the gate clear to merge the
> > fixes for the 4 problems first.
> >
> > 1) devstack-gate broke us because we use it as a library (bad)
> > https://bugs.launchpad.net/tripleo/+bug/1713868
> >
> > 2) https://review.openstack.org/#/c/474578/ broke us and we're
> > reverting it https://bugs.launchpad.net/tripleo/+bug/1713832
> >
>

sorry to hear that :/ I just read that bug and had a look at the linked
logs from there (
http://logs.openstack.org/93/489393/9/gate/gate-tripleo-ci-centos-7-scenario002-multinode-oooq-container/043a4f5/logs/undercloud/home/jenkins/overcloud_deploy.log.txt.gz
and I also looked at the rest of the logs on that particular review
https://review.openstack.org/#/c/489393 ) but can't see something about the
undercloud post upgrade validations (& the bug is about zaqar?). Did the
revert help (sounds like it did since the gate is fixed)? I'd like to look
into this some more and understand the bug better so I can work out how to
land the patch again safely -  I'll try and catch you on irc when you're in
later if you can spare some time

thanks, marios



> > 3) We shouldn't build images on multinode jobs
> > https://bugs.launchpad.net/tripleo/+bug/1713167
> >
> > 4) We should use pip instead of git for delorean
> > https://bugs.launchpad.net/tripleo/+bug/1708832
> >
> >
> > Until further notice from Alex or myself, please do not approve any
> patch.
>
> The 4 problems have been mitigated.
> You can now proceed to normal review.
>
> Please do not recheck a patch without an elastic-recheck comment, we
> need to track all issues related to CI from now.
> Paul Belanger has been doing extremely useful work to help us, now
> let's use elastic-recheck more and stop blind rechecks.
> All known issues are in http://status.openstack.org/elastic-recheck/
> If one is missing, you're welcome to contribute by sending a patch to
> elastic-recheck. Example with https://review.openstack.org/#/c/498954/
>
> I've restored all patches that were killed from the gate and did
> recheck already, hopefully we can get some merges and finish this
> release.
>
> Thanks Paul and all Infra for their consistent help!
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Gate is broken - Do not approve any patch until further notice

2017-08-29 Thread Emilien Macchi
On Tue, Aug 29, 2017 at 4:17 PM, Emilien Macchi  wrote:
> We are currently dealing with 4 issues and until they are fix, please
> do not approve any patch. We want to keep the gate clear to merge the
> fixes for the 4 problems first.
>
> 1) devstack-gate broke us because we use it as a library (bad)
> https://bugs.launchpad.net/tripleo/+bug/1713868
>
> 2) https://review.openstack.org/#/c/474578/ broke us and we're
> reverting it https://bugs.launchpad.net/tripleo/+bug/1713832
>
> 3) We shouldn't build images on multinode jobs
> https://bugs.launchpad.net/tripleo/+bug/1713167
>
> 4) We should use pip instead of git for delorean
> https://bugs.launchpad.net/tripleo/+bug/1708832
>
>
> Until further notice from Alex or myself, please do not approve any patch.

The 4 problems have been mitigated.
You can now proceed to normal review.

Please do not recheck a patch without an elastic-recheck comment, we
need to track all issues related to CI from now.
Paul Belanger has been doing extremely useful work to help us, now
let's use elastic-recheck more and stop blind rechecks.
All known issues are in http://status.openstack.org/elastic-recheck/
If one is missing, you're welcome to contribute by sending a patch to
elastic-recheck. Example with https://review.openstack.org/#/c/498954/

I've restored all patches that were killed from the gate and did
recheck already, hopefully we can get some merges and finish this
release.

Thanks Paul and all Infra for their consistent help!
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev