Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-23 Thread Korzeniewski, Artur
Hi,
I have re-spin the grenade multimode dvr config [1].

In my understanding, this job would install multinode environment, with L3 
agent and metadata agent running on subnode.
Also to take advantage of this setup:
a) Grenade tests should create DVR router as resource,
b) the Tempest smoke tests should interact with DVR feature.

When a) and b) would not use the DVR feature, we would have DVR-aware setup 
configured, but no real interaction with DVR done.
By default, Grenade jobs are launching the tempest smoke tests only, and no 
full tempest suit.

To avoid having 2 jobs running Grenade multinode setup, we can enable only the 
DVR one in check queue.
To have proper interaction with DVR feature, we can adjust the Grenade tests 
and tempest smoke suit.

Regards,
Artur Korzeniewski
IRC: korzen

[1] https://review.openstack.org/#/c/250215/4

From: Armando M. [mailto:arma...@gmail.com]
Sent: Monday, February 22, 2016 6:01 PM
To: OpenStack Development Mailing List (not for usage questions) 

Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial 
upgrade - Nova metadata failure



On 22 February 2016 at 08:52, Ihar Hrachyshka 
mailto:ihrac...@redhat.com>> wrote:
Armando M. mailto:arma...@gmail.com>> wrote:


On 22 February 2016 at 04:56, Ihar Hrachyshka 
mailto:ihrac...@redhat.com>> wrote:
Sean M. Collins mailto:s...@coreitpro.com>> wrote:

Armando M. wrote:
Now that the blocking issue has been identified, I filed project-config
change [1] to enable us to test the Neutron Grenade multinode more
thoroughly.

[1] https://review.openstack.org/#/c/282428/


Indeed - I want to profusely thank everyone that I reached out to during
these past months when I got stuck on this. Ihar, Matt K, Kevin B,
Armando - this is a huge win.

--
Sean M. Collins

Thanks everyone to make that latest push. We are almost there!..

I guess the next steps are:
- monitoring the job for a week, making sure it’s stable enough (comparing 
failure rate to non-partial grenade job?);

Btw, the job trend is here:

http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen

I'd prefer to wait a little longer. Depending on how things go we may want to 
make it not until N opens up.

Agreed.

- if everything goes fine, propose project-config change to make it voting;
- propose governance patch to enable rolling-upgrade tag for neutron repo (I 
believe not for *aas repos though?).

I guess with that we would be able to claim victory for the basic 'server vs. 
agent’ part of rolling scenario. Right?

Follow up steps would probably be:
- look at enabling partial job for DVR flavour;

That should be only instrumental to see how sane DVR during upgrades is, and 
proceed in tweaking the existing grenade-multi job in the check queue to be 
dvr-aware. In other words: I personally wouldn't want to see two grenade jobs 
in the gate.

Ack, that would be the end goal. There still may be some short time when both 
are in gate.

- proceed on objectification of neutron db layer to open doors for later mixed 
server versions in the same cluster.

Anything I missed?

Also, what do we do with non-partial flavour of the job? Is it staying?

What job are you talking about exactly?

gate-grenade-dsvm-neutron

It’s not ‘partial’ in that we don’t run mixed versions of components during 
tempest run. It only covers that new code can run using old configuration 
files, and that alembic migrations apply correctly for some limited number of 
so called ‘long standing’ resources like instances created on the ‘old’ side of 
grenade.

Yes, that is staying. Especially considering that's part of the integrate gate 
on a bunch of other projects. We'll reconsider what to do, once we strengthen 
our rolling upgrade story.



Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-22 Thread Armando M.
On 22 February 2016 at 08:52, Ihar Hrachyshka  wrote:

> Armando M.  wrote:
>
>
>>
>> On 22 February 2016 at 04:56, Ihar Hrachyshka 
>> wrote:
>> Sean M. Collins  wrote:
>>
>> Armando M. wrote:
>> Now that the blocking issue has been identified, I filed project-config
>> change [1] to enable us to test the Neutron Grenade multinode more
>> thoroughly.
>>
>> [1] https://review.openstack.org/#/c/282428/
>>
>>
>> Indeed - I want to profusely thank everyone that I reached out to during
>> these past months when I got stuck on this. Ihar, Matt K, Kevin B,
>> Armando - this is a huge win.
>>
>> --
>> Sean M. Collins
>>
>> Thanks everyone to make that latest push. We are almost there!..
>>
>> I guess the next steps are:
>> - monitoring the job for a week, making sure it’s stable enough
>> (comparing failure rate to non-partial grenade job?);
>>
>> Btw, the job trend is here:
>>
>>
>> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen
>>
>> I'd prefer to wait a little longer. Depending on how things go we may
>> want to make it not until N opens up.
>>
>
> Agreed.
>
>
>> - if everything goes fine, propose project-config change to make it
>> voting;
>> - propose governance patch to enable rolling-upgrade tag for neutron repo
>> (I believe not for *aas repos though?).
>>
>> I guess with that we would be able to claim victory for the basic 'server
>> vs. agent’ part of rolling scenario. Right?
>>
>> Follow up steps would probably be:
>> - look at enabling partial job for DVR flavour;
>>
>> That should be only instrumental to see how sane DVR during upgrades is,
>> and proceed in tweaking the existing grenade-multi job in the check queue
>> to be dvr-aware. In other words: I personally wouldn't want to see two
>> grenade jobs in the gate.
>>
>
> Ack, that would be the end goal. There still may be some short time when
> both are in gate.
>
>
>> - proceed on objectification of neutron db layer to open doors for later
>> mixed server versions in the same cluster.
>>
>> Anything I missed?
>>
>> Also, what do we do with non-partial flavour of the job? Is it staying?
>>
>> What job are you talking about exactly?
>>
>
> gate-grenade-dsvm-neutron
>
> It’s not ‘partial’ in that we don’t run mixed versions of components
> during tempest run. It only covers that new code can run using old
> configuration files, and that alembic migrations apply correctly for some
> limited number of so called ‘long standing’ resources like instances
> created on the ‘old’ side of grenade.


Yes, that is staying. Especially considering that's part of the integrate
gate on a bunch of other projects. We'll reconsider what to do, once we
strengthen our rolling upgrade story.


>
>
> Ihar
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-22 Thread Ihar Hrachyshka

Armando M.  wrote:




On 22 February 2016 at 04:56, Ihar Hrachyshka  wrote:
Sean M. Collins  wrote:

Armando M. wrote:
Now that the blocking issue has been identified, I filed project-config
change [1] to enable us to test the Neutron Grenade multinode more
thoroughly.

[1] https://review.openstack.org/#/c/282428/


Indeed - I want to profusely thank everyone that I reached out to during
these past months when I got stuck on this. Ihar, Matt K, Kevin B,
Armando - this is a huge win.

--
Sean M. Collins

Thanks everyone to make that latest push. We are almost there!..

I guess the next steps are:
- monitoring the job for a week, making sure it’s stable enough  
(comparing failure rate to non-partial grenade job?);


Btw, the job trend is here:

http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen

I'd prefer to wait a little longer. Depending on how things go we may  
want to make it not until N opens up.


Agreed.



- if everything goes fine, propose project-config change to make it voting;
- propose governance patch to enable rolling-upgrade tag for neutron repo  
(I believe not for *aas repos though?).


I guess with that we would be able to claim victory for the basic 'server  
vs. agent’ part of rolling scenario. Right?


Follow up steps would probably be:
- look at enabling partial job for DVR flavour;

That should be only instrumental to see how sane DVR during upgrades is,  
and proceed in tweaking the existing grenade-multi job in the check queue  
to be dvr-aware. In other words: I personally wouldn't want to see two  
grenade jobs in the gate.


Ack, that would be the end goal. There still may be some short time when  
both are in gate.




- proceed on objectification of neutron db layer to open doors for later  
mixed server versions in the same cluster.


Anything I missed?

Also, what do we do with non-partial flavour of the job? Is it staying?

What job are you talking about exactly?


gate-grenade-dsvm-neutron

It’s not ‘partial’ in that we don’t run mixed versions of components during  
tempest run. It only covers that new code can run using old configuration  
files, and that alembic migrations apply correctly for some limited number  
of so called ‘long standing’ resources like instances created on the ‘old’  
side of grenade.


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-22 Thread Armando M.
On 22 February 2016 at 04:56, Ihar Hrachyshka  wrote:

> Sean M. Collins  wrote:
>
> Armando M. wrote:
>>
>>> Now that the blocking issue has been identified, I filed project-config
>>> change [1] to enable us to test the Neutron Grenade multinode more
>>> thoroughly.
>>>
>>> [1] https://review.openstack.org/#/c/282428/
>>>
>>
>>
>> Indeed - I want to profusely thank everyone that I reached out to during
>> these past months when I got stuck on this. Ihar, Matt K, Kevin B,
>> Armando - this is a huge win.
>>
>> --
>> Sean M. Collins
>>
>
> Thanks everyone to make that latest push. We are almost there!..
>
> I guess the next steps are:
> - monitoring the job for a week, making sure it’s stable enough (comparing
> failure rate to non-partial grenade job?);


Btw, the job trend is here:

http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=6&fullscreen

I'd prefer to wait a little longer. Depending on how things go we may want
to make it not until N opens up.


> - if everything goes fine, propose project-config change to make it voting;
> - propose governance patch to enable rolling-upgrade tag for neutron repo
> (I believe not for *aas repos though?).
>
> I guess with that we would be able to claim victory for the basic 'server
> vs. agent’ part of rolling scenario. Right?
>
> Follow up steps would probably be:
> - look at enabling partial job for DVR flavour;
>

That should be only instrumental to see how sane DVR during upgrades is,
and proceed in tweaking the existing grenade-multi job in the check queue
to be dvr-aware. In other words: I personally wouldn't want to see two
grenade jobs in the gate.


> - proceed on objectification of neutron db layer to open doors for later
> mixed server versions in the same cluster.
>
> Anything I missed?
>
> Also, what do we do with non-partial flavour of the job? Is it staying?


What job are you talking about exactly?


>
>
> Ihar
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-22 Thread Ihar Hrachyshka

Sean M. Collins  wrote:




Also, what do we do with non-partial flavour of the job? Is it staying?


Is it useful? I think operators are more likely to upgrade
components of a cluster incrementally - so the partial jobs are going to
reflect the reality on the ground better.


I guess we could give them some more time to change their upgrade  
practices. Like putting specific release note on the supported scenario in  
Mitaka, and keeping the non-partial job till Newton-1, and only then  
dropping it.


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-22 Thread Sean M. Collins
Ihar Hrachyshka wrote:
> I guess the next steps are:
> - monitoring the job for a week, making sure it’s stable enough (comparing
> failure rate to non-partial grenade job?);
> - if everything goes fine, propose project-config change to make it voting;
> - propose governance patch to enable rolling-upgrade tag for neutron repo (I
> believe not for *aas repos though?).

Agree - I believe this is our next steps.

> I guess with that we would be able to claim victory for the basic 'server
> vs. agent’ part of rolling scenario. Right?

Correct - it also tests "new agent vs. old agent" since the
primary node runs the neutron agent, upgrades it, while the subnode runs the 
neutron
agent that is not upgraded.

I think there could be some work done to ensure that instances are
scheduled on both the primary node and the subnode so we get more
coverage.

> Follow up steps would probably be:
> - look at enabling partial job for DVR flavour;

Agree - I guess we can resurrect https://review.openstack.org/#/c/250215 ?

> - proceed on objectification of neutron db layer to open doors for later
> mixed server versions in the same cluster.

Sounds good.

> Anything I missed?

I think that's a good starting point. If we think of of other things we
can add them.

> Also, what do we do with non-partial flavour of the job? Is it staying?

Is it useful? I think operators are more likely to upgrade
components of a cluster incrementally - so the partial jobs are going to
reflect the reality on the ground better.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-22 Thread Ihar Hrachyshka

Sean M. Collins  wrote:


Armando M. wrote:

Now that the blocking issue has been identified, I filed project-config
change [1] to enable us to test the Neutron Grenade multinode more
thoroughly.

[1] https://review.openstack.org/#/c/282428/



Indeed - I want to profusely thank everyone that I reached out to during
these past months when I got stuck on this. Ihar, Matt K, Kevin B,
Armando - this is a huge win.

--
Sean M. Collins


Thanks everyone to make that latest push. We are almost there!..

I guess the next steps are:
- monitoring the job for a week, making sure it’s stable enough (comparing  
failure rate to non-partial grenade job?);

- if everything goes fine, propose project-config change to make it voting;
- propose governance patch to enable rolling-upgrade tag for neutron repo  
(I believe not for *aas repos though?).


I guess with that we would be able to claim victory for the basic 'server  
vs. agent’ part of rolling scenario. Right?


Follow up steps would probably be:
- look at enabling partial job for DVR flavour;
- proceed on objectification of neutron db layer to open doors for later  
mixed server versions in the same cluster.


Anything I missed?

Also, what do we do with non-partial flavour of the job? Is it staying?

Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-19 Thread Sean M. Collins
Armando M. wrote:
> Now that the blocking issue has been identified, I filed project-config
> change [1] to enable us to test the Neutron Grenade multinode more
> thoroughly.
> 
> [1] https://review.openstack.org/#/c/282428/


Indeed - I want to profusely thank everyone that I reached out to during
these past months when I got stuck on this. Ihar, Matt K, Kevin B,
Armando - this is a huge win.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-19 Thread Vasudevan, Swaminathan (PNB Roseville)
Hi Folks,
Great Job!

Thanks
Swami

From: Armando M. [mailto:arma...@gmail.com]
Sent: Friday, February 19, 2016 9:07 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial 
upgrade - Nova metadata failure



On 19 February 2016 at 04:43, Sean Dague 
mailto:s...@dague.net>> wrote:
On 02/18/2016 09:50 PM, Armando M. wrote:
>
>
> On 18 February 2016 at 08:41, Sean M. Collins 
> mailto:s...@coreitpro.com>
> <mailto:s...@coreitpro.com<mailto:s...@coreitpro.com>>> wrote:
>
> This week's update:
>
> Armando was kind enough to take a look[1], since he's got a fresh
> perspective. I think I've been suffering from Target Fixation[1]
> where I failed to notice a couple other failures in the logs.
>
>
> It's been fun, and I am glad I was able to help. Once I validated the
> root cause of the metadata failure [1], I got run [2] and a clean pass
> in [3] :)
>
> There are still a few things to iron out, ie. choosing metadata over
> config-drive, testing both in the gate etc. But that's for another day.
>
> Cheers,
> Armando
>
> [1] https://bugs.launchpad.net/nova/+bug/1545101/comments/4
> [2] 
> http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/
> [3] 
> http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/logs/testr_results.html.gz

I want to thank everyone that's been working on this issue profusely.
This exposed a release critical bug in Nova that we would not have
caught otherwise. Finding that before milestone 3 is a huge win and
gives us a lot more options in fixing it correctly.

I think we've got the proper fix now -
https://review.openstack.org/#/c/279721/ (fingers crossed). The metadata
server is one of the least tested components we've got on the Nova side,
so I'll be looking at ways to fix that problem and hopefully avoid
situations like this again.

Now that the blocking issue has been identified, I filed project-config change 
[1] to enable us to test the Neutron Grenade multinode more thoroughly.

[1] https://review.openstack.org/#/c/282428/


-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-19 Thread Armando M.
On 19 February 2016 at 04:43, Sean Dague  wrote:

> On 02/18/2016 09:50 PM, Armando M. wrote:
> >
> >
> > On 18 February 2016 at 08:41, Sean M. Collins  > > wrote:
> >
> > This week's update:
> >
> > Armando was kind enough to take a look[1], since he's got a fresh
> > perspective. I think I've been suffering from Target Fixation[1]
> > where I failed to notice a couple other failures in the logs.
> >
> >
> > It's been fun, and I am glad I was able to help. Once I validated the
> > root cause of the metadata failure [1], I got run [2] and a clean pass
> > in [3] :)
> >
> > There are still a few things to iron out, ie. choosing metadata over
> > config-drive, testing both in the gate etc. But that's for another day.
> >
> > Cheers,
> > Armando
> >
> > [1] https://bugs.launchpad.net/nova/+bug/1545101/comments/4
> > [2]
> http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/
> > [3]
> http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/logs/testr_results.html.gz
>
> I want to thank everyone that's been working on this issue profusely.
> This exposed a release critical bug in Nova that we would not have
> caught otherwise. Finding that before milestone 3 is a huge win and
> gives us a lot more options in fixing it correctly.
>
> I think we've got the proper fix now -
> https://review.openstack.org/#/c/279721/ (fingers crossed). The metadata
> server is one of the least tested components we've got on the Nova side,
> so I'll be looking at ways to fix that problem and hopefully avoid
> situations like this again.
>

Now that the blocking issue has been identified, I filed project-config
change [1] to enable us to test the Neutron Grenade multinode more
thoroughly.

[1] https://review.openstack.org/#/c/282428/


> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade - Nova metadata failure

2016-02-19 Thread Sean Dague
On 02/18/2016 09:50 PM, Armando M. wrote:
> 
> 
> On 18 February 2016 at 08:41, Sean M. Collins  > wrote:
> 
> This week's update:
> 
> Armando was kind enough to take a look[1], since he's got a fresh
> perspective. I think I've been suffering from Target Fixation[1]
> where I failed to notice a couple other failures in the logs.
> 
> 
> It's been fun, and I am glad I was able to help. Once I validated the
> root cause of the metadata failure [1], I got run [2] and a clean pass
> in [3] :)
> 
> There are still a few things to iron out, ie. choosing metadata over
> config-drive, testing both in the gate etc. But that's for another day.
> 
> Cheers,
> Armando
> 
> [1] https://bugs.launchpad.net/nova/+bug/1545101/comments/4
> [2] 
> http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/
> [3] 
> http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/logs/testr_results.html.gz

I want to thank everyone that's been working on this issue profusely.
This exposed a release critical bug in Nova that we would not have
caught otherwise. Finding that before milestone 3 is a huge win and
gives us a lot more options in fixing it correctly.

I think we've got the proper fix now -
https://review.openstack.org/#/c/279721/ (fingers crossed). The metadata
server is one of the least tested components we've got on the Nova side,
so I'll be looking at ways to fix that problem and hopefully avoid
situations like this again.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-02-18 Thread Armando M.
On 18 February 2016 at 08:41, Sean M. Collins  wrote:

> This week's update:
>
> Armando was kind enough to take a look[1], since he's got a fresh
> perspective. I think I've been suffering from Target Fixation[1]
> where I failed to notice a couple other failures in the logs.
>

It's been fun, and I am glad I was able to help. Once I validated the root
cause of the metadata failure [1], I got run [2] and a clean pass in [3] :)

There are still a few things to iron out, ie. choosing metadata over
config-drive, testing both in the gate etc. But that's for another day.

Cheers,
Armando

[1] https://bugs.launchpad.net/nova/+bug/1545101/comments/4
[2]
http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/
[3]
http://logs.openstack.org/00/281600/6/experimental/gate-grenade-dsvm-neutron-multinode/40e16c8/logs/testr_results.html.gz



>
> For example - during the SSH test into the instances, we are able to get
> a full SSH handshake and offer up the SSH key, however authentication
> fails[3], apparently due to the fact that the instance is not successful
> in contacting the metadata service and getting the SSH public key[4].
>
> So, I think the next bit of work is to track down why the metadata
> service isn't functioning properly. We pinged Matt Riedemann about one
> error we saw over in the nova metadata service, however he had seen it
> before us and already wrote a fix[5].
>
> That's the status of where things stand. Metadata service being broken,
> and also still MTU issues lurking in the background.
>
> [1]:
> http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2016-02-18.log.html#t2016-02-18T00:26:29
> [2]: https://en.wikipedia.org/wiki/Target_fixation
> [3]:
> http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2016-02-18.log.html#t2016-02-18T01:18:32
> [4]:
> http://logs.openstack.org/78/279378/9/experimental/gate-grenade-dsvm-neutron-multinode/40a5659/console.html#_2016-02-17_22_37_33_277
> [5]: https://review.openstack.org/#/c/279721/
> --
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-02-18 Thread Clark Boylan
On Wed, Feb 10, 2016, at 09:52 AM, Sean M. Collins wrote:
> Ihar Hrachyshka wrote:
> > Also, I added some interface state dump for worlddump, and here is how the
> > main node networking setup looks like:
> > 
> > http://logs.openstack.org/59/265759/20/experimental/gate-grenade-dsvm-neutron-multinode/d64a6e6/logs/worlddump-2016-01-30-164508.txt.gz
> > 
> > br-ex: mtu = 1450
> > inside router: qg mtu = 1450, qr = 1450
> > 
> > So should be fine in this regard. I also set devstack locally enforcing
> > network_device_mtu, and it seems to pass packets of 1450 size through. So
> > it’s probably something tunneling packets to the subnode that fails for us,
> > not local router-to-tap bits.
> 
> Yeah! That's right. So is it the case that we need to do 1500 less the
> GRE overhead less the VXLAN overhead? So 1446? Since the traffic gets
> enacpsulated in VXLAN then encapsulated in GRE (yo dawg, I heard u like
> tunneling).

Looks like you made progress further debugging the problems here and
metadata service is the culprit. But I want to point out that we
shouldn't be nesting tunnels here (at least not in a way that is exposed
to us, the underlying cloud could be doing whatever). br-int is the
neutron managed tunnel using vxlan and that is the only layer of
tunneling for br-int. br-ex is part of the devstack-gate managed VXLAN
tunnel (formerly GRE until new clouds started rejecting GRE packets) on
the DVR jobs but not the normal multinode or grenade jobs because the
DVR job is the only one with more than one router.

All that to say 1450 should be a sufficiently small MTU.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-02-18 Thread Sean M. Collins
This week's update:

Armando was kind enough to take a look[1], since he's got a fresh
perspective. I think I've been suffering from Target Fixation[1]
where I failed to notice a couple other failures in the logs.

For example - during the SSH test into the instances, we are able to get
a full SSH handshake and offer up the SSH key, however authentication
fails[3], apparently due to the fact that the instance is not successful
in contacting the metadata service and getting the SSH public key[4].

So, I think the next bit of work is to track down why the metadata
service isn't functioning properly. We pinged Matt Riedemann about one
error we saw over in the nova metadata service, however he had seen it
before us and already wrote a fix[5].

That's the status of where things stand. Metadata service being broken,
and also still MTU issues lurking in the background.

[1]: 
http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2016-02-18.log.html#t2016-02-18T00:26:29
[2]: https://en.wikipedia.org/wiki/Target_fixation
[3]: 
http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2016-02-18.log.html#t2016-02-18T01:18:32
[4]: 
http://logs.openstack.org/78/279378/9/experimental/gate-grenade-dsvm-neutron-multinode/40a5659/console.html#_2016-02-17_22_37_33_277
[5]: https://review.openstack.org/#/c/279721/
-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-02-10 Thread Sean M. Collins
Ihar Hrachyshka wrote:
> Actually, we already have 1450 for network_device_mtu for the job since:
> 
> https://review.openstack.org/#/c/267847/4/devstack-vm-gate.sh
> 

Ah! Forgot about that one. Cool.

> Also, I added some interface state dump for worlddump, and here is how the
> main node networking setup looks like:
> 
> http://logs.openstack.org/59/265759/20/experimental/gate-grenade-dsvm-neutron-multinode/d64a6e6/logs/worlddump-2016-01-30-164508.txt.gz
> 
> br-ex: mtu = 1450
> inside router: qg mtu = 1450, qr = 1450
> 
> So should be fine in this regard. I also set devstack locally enforcing
> network_device_mtu, and it seems to pass packets of 1450 size through. So
> it’s probably something tunneling packets to the subnode that fails for us,
> not local router-to-tap bits.

Yeah! That's right. So is it the case that we need to do 1500 less the
GRE overhead less the VXLAN overhead? So 1446? Since the traffic gets
enacpsulated in VXLAN then encapsulated in GRE (yo dawg, I heard u like
tunneling).

http://baturin.org/tools/encapcalc/


> 
> I also see br-tun having 1500. Is it a problem? Probably not, but I admit I
> miss a lot in this topic so far.

Dunno. Maybe?

> Also I see some qg-2c68fb65-21 device in the worlddump output from above in
> global namespace. The device has mtu = 1500. Which router does the device
> belong to?..

Good question.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-02-10 Thread Ihar Hrachyshka

Sean M. Collins  wrote:


Ihar Hrachyshka wrote:
UPD: seems like enforcing instance mtu to 1400 indeed makes us pass  
forward

into tempest:

http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html

And there are only three failures there:

http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html#_2016-01-11_11_58_47_945

I also don’t see any RPC versioning related traces in service logs,  
which is

a good sign.


Just an update - we are still stuck on those three tempest tests.

I was able to dig a bit and it looks like it's still an MTU issue.


http://logs.openstack.org/35/187235/14/experimental/gate-grenade-dsvm-neutron-multinode/c5eda62/logs/tempest.txt.gz#_2016-02-09_20_37_40_044

"SSHException: Error reading SSH protocol banner[Errno 104] Connection  
reset by peer”


Note that this time we get reset immediately instead of being stuck there  
until timeout.




I tried pushing down a patch to cram network_device_mtu down to 1450 in
the hopes that it would do the trick - but sadly that didn't fix. I’m


Actually, we already have 1450 for network_device_mtu for the job since:

https://review.openstack.org/#/c/267847/4/devstack-vm-gate.sh

Also, I added some interface state dump for worlddump, and here is how the  
main node networking setup looks like:


http://logs.openstack.org/59/265759/20/experimental/gate-grenade-dsvm-neutron-multinode/d64a6e6/logs/worlddump-2016-01-30-164508.txt.gz

br-ex: mtu = 1450
inside router: qg mtu = 1450, qr = 1450

So should be fine in this regard. I also set devstack locally enforcing  
network_device_mtu, and it seems to pass packets of 1450 size through. So  
it’s probably something tunneling packets to the subnode that fails for us,  
not local router-to-tap bits.


I also see br-tun having 1500. Is it a problem? Probably not, but I admit I  
miss a lot in this topic so far.


Also I see some qg-2c68fb65-21 device in the worlddump output from above in  
global namespace. The device has mtu = 1500. Which router does the device  
belong to?..




going to have to keep digging. I am almost certain it's something that
Matt K (Sam-I-Am) has already made note of in his research.


Actually, I don’t think Matt ran any tests for MTU that is reduced  
comparing to ‘standard’ 1500 size. It would be interesting to see how it  
goes in his lab with the limited mtu size we use in gate.


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-02-10 Thread Sean M. Collins
Ihar Hrachyshka wrote:
> UPD: seems like enforcing instance mtu to 1400 indeed makes us pass forward
> into tempest:
> 
> http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html
> 
> And there are only three failures there:
> 
> http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html#_2016-01-11_11_58_47_945
> 
> I also don’t see any RPC versioning related traces in service logs, which is
> a good sign.
> 

Just an update - we are still stuck on those three tempest tests.

I was able to dig a bit and it looks like it's still an MTU issue.


http://logs.openstack.org/35/187235/14/experimental/gate-grenade-dsvm-neutron-multinode/c5eda62/logs/tempest.txt.gz#_2016-02-09_20_37_40_044

"SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by 
peer"

I tried pushing down a patch to cram network_device_mtu down to 1450 in
the hopes that it would do the trick - but sadly that didn't fix. I'm
going to have to keep digging. I am almost certain it's something that
Matt K (Sam-I-Am) has already made note of in his research.


-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-12 Thread Ihar Hrachyshka

Clark Boylan  wrote:


On Mon, Jan 11, 2016, at 12:35 PM, Sean M. Collins wrote:

Nice find. I actually pushed a patch recently that we should be
advertising the MTU by default. I think this really shows that it should
be enabled by default.

https://review.openstack.org/263486l

++ Neutron should be able to determine what the outer MTU is and adjust
the advertised inner MTU automatically based on the overhead required
for whatever tunnel protocol is in use all without the deployer or cloud
user needing to know anything special.


Let me clarify: there is still requirement for the image you boot to honour  
one of the ways we advertise MTU from neutron (DHCP MTU opton; RA for IPv6  
networks; …) Neutron cannot change mtu for the device that is seen from  
inside of the instance.


That said, most real images, including cirros since last year, should  
support the DHCP option.


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-11 Thread Sean M. Collins
On Mon, Jan 11, 2016 at 12:57:05PM PST, Clark Boylan wrote:
> On Mon, Jan 11, 2016, at 12:35 PM, Sean M. Collins wrote:
> > Nice find. I actually pushed a patch recently that we should be
> > advertising the MTU by default. I think this really shows that it should
> > be enabled by default.
> > 
> > https://review.openstack.org/263486l
> >
> ++ Neutron should be able to determine what the outer MTU is and adjust
> the advertised inner MTU automatically based on the overhead required
> for whatever tunnel protocol is in use all without the deployer or cloud
> user needing to know anything special.

Right - and Neutron does when an operator explicitly enables it. I
think it's one of those things where we exercised abundant caution when
merging the feature, where we didn't enable it by default and then it
slipped through the cracks.

So - I think maybe we need to be a little more aggressive in enabling
things by default that really have no reason to not be enabled. 

Neutron should Just Work™

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-11 Thread Clark Boylan
On Mon, Jan 11, 2016, at 12:35 PM, Sean M. Collins wrote:
> Nice find. I actually pushed a patch recently that we should be
> advertising the MTU by default. I think this really shows that it should
> be enabled by default.
> 
> https://review.openstack.org/263486l
>
++ Neutron should be able to determine what the outer MTU is and adjust
the advertised inner MTU automatically based on the overhead required
for whatever tunnel protocol is in use all without the deployer or cloud
user needing to know anything special.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-11 Thread Sean M. Collins
Nice find. I actually pushed a patch recently that we should be advertising the 
MTU by default. I think this really shows that it should be enabled by default.

https://review.openstack.org/263486l
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-11 Thread Ihar Hrachyshka

Ihar Hrachyshka  wrote:


Ihar Hrachyshka  wrote:


Sean M. Collins  wrote:


Just a quick update where we are:

Increasing the verbosity of the SSH session into the instance that is
created during the cinder portion is showing that we are actually
connecting to the instance successfully. We get the dropbear SSH banner,
but then the instance hangs. Eventually SSH terminates the connection, 5
minutes later.

http://logs.openstack.org/35/187235/12/experimental/gate-grenade-dsvm-neutron-multinode/984e651/logs/grenade.sh.txt.gz#_2016-01-08_20_13_40_040


As per [1], could be related to mtu on the interface. Do we configure  
MTU on external devices to accommodate for tunnelling headers?


As per [2] neutron server logs, the network in question is vxlan.

If that’s indeed the mtu issue, and since Cirros does not support DHCP  
MTU option documented for ml2 at [3], I don’t know how to validate  
whether it’s indeed the issue.


UPD: ^ that’s actually not true, cirros supports the option since 0.3.3  
[1] (and we use 0.3.4 [2]), so let’s try to enforce it inside neutron and  
see whether it helps: https://review.openstack.org/265759


UPD: seems like enforcing instance mtu to 1400 indeed makes us pass forward  
into tempest:


http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html

And there are only three failures there:

http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html#_2016-01-11_11_58_47_945

I also don’t see any RPC versioning related traces in service logs, which  
is a good sign.





[1] https://bugs.launchpad.net/cirros/+bug/1301958
[2]  
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/a5af283/logs/grenade.sh.txt.gz#_2015-11-30_18_57_46_067




Also, what’s the underlying infrastructure that is available in gate?  
Does it allow vlan for tenant networks? (We could enforce vlan for the  
network and see whether it fixes the issue.)


[1] https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1254085
[2]  
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/a5af283/logs/old/screen-q-svc.txt.gz#_2015-11-30_19_28_44_685
[3]  
https://access.redhat.com/documentation/en/red-hat-enterprise-linux-openstack-platform/7/networking-guide/chapter-16-configure-mtu-settings


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-11 Thread Ihar Hrachyshka

Ihar Hrachyshka  wrote:


Sean M. Collins  wrote:


Just a quick update where we are:

Increasing the verbosity of the SSH session into the instance that is
created during the cinder portion is showing that we are actually
connecting to the instance successfully. We get the dropbear SSH banner,
but then the instance hangs. Eventually SSH terminates the connection, 5
minutes later.

http://logs.openstack.org/35/187235/12/experimental/gate-grenade-dsvm-neutron-multinode/984e651/logs/grenade.sh.txt.gz#_2016-01-08_20_13_40_040


As per [1], could be related to mtu on the interface. Do we configure MTU  
on external devices to accommodate for tunnelling headers?


As per [2] neutron server logs, the network in question is vxlan.

If that’s indeed the mtu issue, and since Cirros does not support DHCP  
MTU option documented for ml2 at [3], I don’t know how to validate  
whether it’s indeed the issue.


UPD: ^ that’s actually not true, cirros supports the option since 0.3.3 [1]  
(and we use 0.3.4 [2]), so let’s try to enforce it inside neutron and see  
whether it helps: https://review.openstack.org/265759


[1] https://bugs.launchpad.net/cirros/+bug/1301958
[2]  
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/a5af283/logs/grenade.sh.txt.gz#_2015-11-30_18_57_46_067






Also, what’s the underlying infrastructure that is available in gate?  
Does it allow vlan for tenant networks? (We could enforce vlan for the  
network and see whether it fixes the issue.)


[1] https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1254085
[2]  
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/a5af283/logs/old/screen-q-svc.txt.gz#_2015-11-30_19_28_44_685
[3]  
https://access.redhat.com/documentation/en/red-hat-enterprise-linux-openstack-platform/7/networking-guide/chapter-16-configure-mtu-settings


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-11 Thread Ihar Hrachyshka

Sean M. Collins  wrote:


Just a quick update where we are:

Increasing the verbosity of the SSH session into the instance that is
created during the cinder portion is showing that we are actually
connecting to the instance successfully. We get the dropbear SSH banner,
but then the instance hangs. Eventually SSH terminates the connection, 5
minutes later.

http://logs.openstack.org/35/187235/12/experimental/gate-grenade-dsvm-neutron-multinode/984e651/logs/grenade.sh.txt.gz#_2016-01-08_20_13_40_040


As per [1], could be related to mtu on the interface. Do we configure MTU  
on external devices to accommodate for tunnelling headers?


As per [2] neutron server logs, the network in question is vxlan.

If that’s indeed the mtu issue, and since Cirros does not support DHCP MTU  
option documented for ml2 at [3], I don’t know how to validate whether it’s  
indeed the issue.


Also, what’s the underlying infrastructure that is available in gate? Does  
it allow vlan for tenant networks? (We could enforce vlan for the network  
and see whether it fixes the issue.)


[1] https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1254085
[2]  
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/a5af283/logs/old/screen-q-svc.txt.gz#_2015-11-30_19_28_44_685
[3]  
https://access.redhat.com/documentation/en/red-hat-enterprise-linux-openstack-platform/7/networking-guide/chapter-16-configure-mtu-settings


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2016-01-08 Thread Sean M. Collins
Just a quick update where we are:

Increasing the verbosity of the SSH session into the instance that is
created during the cinder portion is showing that we are actually
connecting to the instance successfully. We get the dropbear SSH banner,
but then the instance hangs. Eventually SSH terminates the connection, 5
minutes later.

http://logs.openstack.org/35/187235/12/experimental/gate-grenade-dsvm-neutron-multinode/984e651/logs/grenade.sh.txt.gz#_2016-01-08_20_13_40_040


-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-12-05 Thread Sean M. Collins
On Fri, Dec 04, 2015 at 01:15:07PM EST, Sean Dague wrote:
> Is *not* always due to a failure.

Yep - sorry. Friday typos :)
-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-12-04 Thread Sean Dague
On 12/04/2015 12:43 PM, Sean M. Collins wrote:
> On Mon, Nov 30, 2015 at 07:00:07AM EST, Sean Dague wrote:
>> On 11/25/2015 11:42 AM, Sean M. Collins wrote:
>>> The first run for the multinode grenade job completed.
>>>
>>> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/
>>>
>>> I'm still getting my bearings in the grenade log output, but if I
>>> understand it correctly, after upgrading Neutron, when spawning a new
>>> instance we do not get successful connectivity. The odd part is we see
>>> the failure twice. Once when upgrading Nova[1], then once when upgrading
>>> Cinder[2]. I would have thought Grenade would have exited after just
>>> failing the first time. It did do a call to worlddump.
>>
>> We're calling worlddump a bunch of times in grenade during success
>> operations to try to help track down why connectivity sometimes goes
>> away when we don't do any actions which we think should affect it. The
>> Nova operation succeeded, there is a successful ping at the end of it.
>>
>> Cinder pinged, but also does ssh verification. That failed.
> 
> Ah - OK, so doing a worlddump in a grenade job is always due to
> failure. Makes sense now, thank you. We'll need to track down why
> instances become unreachable during that stage then.

Is *not* always due to a failure.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-12-04 Thread Sean M. Collins
On Mon, Nov 30, 2015 at 07:00:07AM EST, Sean Dague wrote:
> On 11/25/2015 11:42 AM, Sean M. Collins wrote:
> > The first run for the multinode grenade job completed.
> > 
> > http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/
> > 
> > I'm still getting my bearings in the grenade log output, but if I
> > understand it correctly, after upgrading Neutron, when spawning a new
> > instance we do not get successful connectivity. The odd part is we see
> > the failure twice. Once when upgrading Nova[1], then once when upgrading
> > Cinder[2]. I would have thought Grenade would have exited after just
> > failing the first time. It did do a call to worlddump.
> 
> We're calling worlddump a bunch of times in grenade during success
> operations to try to help track down why connectivity sometimes goes
> away when we don't do any actions which we think should affect it. The
> Nova operation succeeded, there is a successful ping at the end of it.
> 
> Cinder pinged, but also does ssh verification. That failed.

Ah - OK, so doing a worlddump in a grenade job is always due to
failure. Makes sense now, thank you. We'll need to track down why
instances become unreachable during that stage then.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-30 Thread Sean Dague
On 11/25/2015 02:45 PM, Armando M. wrote:
> 
> 
> On 25 November 2015 at 11:33, Armando M.  > wrote:
> 
> 
> 
> On 25 November 2015 at 10:15, Korzeniewski, Artur
> mailto:artur.korzeniew...@intel.com>>
> wrote:
> 
> Yes, this file is complete fine. The Grenade is running smoke
> test before resource creation.
> The workflow is like this:
> 1. Install old devstack on main and subnode
> 2. Run tempest smoke
> 3. Create resources
> 4. Verify resources
> 5. Shutdown the services
> 6. Verify if shutdown does not affect the resources
> 7. Spin new devstack on main node, subnode stays in old version
> without any touch.
> 8. Upgrade the services - start the new code on main node
> 9. Run tempest smoke tests on upgraded main node - it in theory
> should validate if we are cross-version compatible (N and N+1)
> 10. Verify resources
> 11. Shutdown
> 
> 
> Thanks Artur, now it's clear...I can see the right versions etc.
> Having a good run to compare with helps a lot!
> 
> 
> One more thing:
> 
> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/logs/grenade.sh.txt.gz#_2015-11-25_14_31_20_846
> 
> I see no tests running, is it normal?

The final tempest run is handled by devstack-gate, so it's not in the
grenade log, it's in console.html

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-30 Thread Sean Dague
On 11/25/2015 11:42 AM, Sean M. Collins wrote:
> The first run for the multinode grenade job completed.
> 
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/
> 
> I'm still getting my bearings in the grenade log output, but if I
> understand it correctly, after upgrading Neutron, when spawning a new
> instance we do not get successful connectivity. The odd part is we see
> the failure twice. Once when upgrading Nova[1], then once when upgrading
> Cinder[2]. I would have thought Grenade would have exited after just
> failing the first time. It did do a call to worlddump.

We're calling worlddump a bunch of times in grenade during success
operations to try to help track down why connectivity sometimes goes
away when we don't do any actions which we think should affect it. The
Nova operation succeeded, there is a successful ping at the end of it.

Cinder pinged, but also does ssh verification. That failed.

> 
> The odd part is that the first failure is a little strange - it pings
> then sleeps until it successfully gets a packet back. Which it does -
> but then it does a call to worlddump. Odd?
> 
> Anyway, then it goes on to upgrade cinder, then tries to create an
> instance and ssh in, then it fails and grenade exits completely.
> 
> I'll continue digging through the logs to see exactly why we don't get
> connectivity. I see some interesting warnings in the L3 agent log about
> cleaning up a non-existent router[3], which may be the culprit.
> 
> [1]: 
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_20_34_06_742
> 
> [2]: 
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_21_45_15_133
> 
> [3]: 
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/old/screen-q-l3.txt.gz?level=WARNING
> 


-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-26 Thread Korzeniewski, Artur
I have submitted patch for DVR Grenade multinode job:
https://review.openstack.org/#/c/250215

Without DVR upgrade - we won't be able to tell if L3 upgrade is working.
What is left to be done, it is the DVR support in Grenade.
DVR has the multinode job, but I do not see DVR in grenade - creation of DVR 
router should be done in Grenade scripts.

Regards,
Artur  

-Original Message-
From: Sean M. Collins [mailto:s...@coreitpro.com] 
Sent: Wednesday, November 25, 2015 9:03 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial 
upgrade

On Wed, Nov 25, 2015 at 02:31:26PM EST, Armando M. wrote:
> So we fail before even attempting an upgrade?

Yeah I think so, I think we were failing at step 3 of Artur's list, creating 
the resources.

> It looks like we're testing 7.0.1.dev114, shouldn't we test from 7.0.0?

I think Grenade checks out stable/liberty - so that's probably a version string 
generated from the tip of stable/liberty 

> I am really confused, I should probably stop asking questions and do 
> some homework :)

No please keep asking - I think we're all learning things here. I'm certainly 
no expert.

--
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Sean M. Collins
On Wed, Nov 25, 2015 at 02:31:26PM EST, Armando M. wrote:
> So we fail before even attempting an upgrade?

Yeah I think so, I think we were failing at step 3 of Artur's list,
creating the resources.

> It looks like we're testing 7.0.1.dev114, shouldn't we test from 7.0.0?

I think Grenade checks out stable/liberty - so that's probably a version
string generated from the tip of stable/liberty 

> I am really confused, I should probably stop asking questions and do some
> homework :)

No please keep asking - I think we're all learning things here. I'm
certainly no expert.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Armando M.
On 25 November 2015 at 11:33, Armando M.  wrote:

>
>
> On 25 November 2015 at 10:15, Korzeniewski, Artur <
> artur.korzeniew...@intel.com> wrote:
>
>> Yes, this file is complete fine. The Grenade is running smoke test before
>> resource creation.
>> The workflow is like this:
>> 1. Install old devstack on main and subnode
>> 2. Run tempest smoke
>> 3. Create resources
>> 4. Verify resources
>> 5. Shutdown the services
>> 6. Verify if shutdown does not affect the resources
>> 7. Spin new devstack on main node, subnode stays in old version without
>> any touch.
>> 8. Upgrade the services - start the new code on main node
>> 9. Run tempest smoke tests on upgraded main node - it in theory should
>> validate if we are cross-version compatible (N and N+1)
>> 10. Verify resources
>> 11. Shutdown
>>
>
> Thanks Artur, now it's clear...I can see the right versions etc. Having a
> good run to compare with helps a lot!
>

One more thing:

http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/logs/grenade.sh.txt.gz#_2015-11-25_14_31_20_846

I see no tests running, is it normal?


>>
> The successful steps are in log:
>>
>> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/logs/grenade.sh.summary.txt.gz
>>
>>
>>
>> Regards,
>> Artur
>>
>> -----Original Message-----
>> From: Sean M. Collins [mailto:s...@coreitpro.com]
>> Sent: Wednesday, November 25, 2015 7:03 PM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode
>> partial upgrade
>>
>> On Wed, Nov 25, 2015 at 12:53:56PM EST, Armando M. wrote:
>> > On 25 November 2015 at 09:49, Sean M. Collins 
>> wrote:
>> >
>> > > Yeah looks like I read it wrong - the failure occurred during the
>> > > initial resource creation phase, based on comparing the logs that
>> > > Artur posted.
>> > >
>> >
>> > I see. I got confused by this:
>> >
>> > http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-
>> > neutron-multinode/bf6bae1/logs/testr_results.html.gz
>> >
>> > Look at the timestamp, it's from 2 days ago.
>>
>> No, that's correct. It's just taken me 2 days to get around to writing an
>> e-mail about all this :)
>>
>> --
>> Sean M. Collins
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Armando M.
On 25 November 2015 at 10:15, Korzeniewski, Artur <
artur.korzeniew...@intel.com> wrote:

> Yes, this file is complete fine. The Grenade is running smoke test before
> resource creation.
> The workflow is like this:
> 1. Install old devstack on main and subnode
> 2. Run tempest smoke
> 3. Create resources
> 4. Verify resources
> 5. Shutdown the services
> 6. Verify if shutdown does not affect the resources
> 7. Spin new devstack on main node, subnode stays in old version without
> any touch.
> 8. Upgrade the services - start the new code on main node
> 9. Run tempest smoke tests on upgraded main node - it in theory should
> validate if we are cross-version compatible (N and N+1)
> 10. Verify resources
> 11. Shutdown
>

Thanks Artur, now it's clear...I can see the right versions etc. Having a
good run to compare with helps a lot!

>
> The successful steps are in log:
>
> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/logs/grenade.sh.summary.txt.gz
>
>
>
> Regards,
> Artur
>
> -Original Message-
> From: Sean M. Collins [mailto:s...@coreitpro.com]
> Sent: Wednesday, November 25, 2015 7:03 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode
> partial upgrade
>
> On Wed, Nov 25, 2015 at 12:53:56PM EST, Armando M. wrote:
> > On 25 November 2015 at 09:49, Sean M. Collins 
> wrote:
> >
> > > Yeah looks like I read it wrong - the failure occurred during the
> > > initial resource creation phase, based on comparing the logs that
> > > Artur posted.
> > >
> >
> > I see. I got confused by this:
> >
> > http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-
> > neutron-multinode/bf6bae1/logs/testr_results.html.gz
> >
> > Look at the timestamp, it's from 2 days ago.
>
> No, that's correct. It's just taken me 2 days to get around to writing an
> e-mail about all this :)
>
> --
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Armando M.
On 25 November 2015 at 10:03, Sean M. Collins  wrote:

> On Wed, Nov 25, 2015 at 12:53:56PM EST, Armando M. wrote:
> > On 25 November 2015 at 09:49, Sean M. Collins 
> wrote:
> >
> > > Yeah looks like I read it wrong - the failure occurred during the
> > > initial resource creation phase, based on comparing the logs that Artur
> > > posted.
> > >
> >
> > I see. I got confused by this:
> >
> >
> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/logs/testr_results.html.gz
> >
> > Look at the timestamp, it's from 2 days ago.
>
> No, that's correct. It's just taken me 2 days to get around to writing an
> e-mail
> about all this :)
>

Ok, I see now...there's a phase where additional (grenade) resources are
created after having deployed and successfully tested the 'old' cloud,
correct?

So we fail before even attempting an upgrade?

>From here:

http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/logs/grenade.sh.txt.gz#_2015-11-23_18_52_18_102

I can't see any command that hints to upgrading neutron, besides here:

http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/logs/old/screen-q-svc.txt.gz#_2015-11-23_18_35_00_569

It looks like we're testing 7.0.1.dev114, shouldn't we test from 7.0.0?

I am really confused, I should probably stop asking questions and do some
homework :)

--
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Korzeniewski, Artur
Yes, this file is complete fine. The Grenade is running smoke test before 
resource creation.
The workflow is like this:
1. Install old devstack on main and subnode 
2. Run tempest smoke
3. Create resources
4. Verify resources
5. Shutdown the services
6. Verify if shutdown does not affect the resources
7. Spin new devstack on main node, subnode stays in old version without any 
touch.
8. Upgrade the services - start the new code on main node
9. Run tempest smoke tests on upgraded main node - it in theory should validate 
if we are cross-version compatible (N and N+1)
10. Verify resources
11. Shutdown

The successful steps are in log:
http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/logs/grenade.sh.summary.txt.gz



Regards,
Artur

-Original Message-
From: Sean M. Collins [mailto:s...@coreitpro.com] 
Sent: Wednesday, November 25, 2015 7:03 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial 
upgrade

On Wed, Nov 25, 2015 at 12:53:56PM EST, Armando M. wrote:
> On 25 November 2015 at 09:49, Sean M. Collins  wrote:
> 
> > Yeah looks like I read it wrong - the failure occurred during the 
> > initial resource creation phase, based on comparing the logs that 
> > Artur posted.
> >
> 
> I see. I got confused by this:
> 
> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-
> neutron-multinode/bf6bae1/logs/testr_results.html.gz
> 
> Look at the timestamp, it's from 2 days ago.

No, that's correct. It's just taken me 2 days to get around to writing an 
e-mail about all this :)

--
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Sean M. Collins
I'll have to run some rechecks - but perhaps the issue is stable/kilo 
faceplanting on creating the initial resources (which is what, a 30%
success rate? 1 out of 3 runs?) - the only run where we got far enough
to upgrade we were actually successful?

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Sean M. Collins
On Wed, Nov 25, 2015 at 12:53:56PM EST, Armando M. wrote:
> On 25 November 2015 at 09:49, Sean M. Collins  wrote:
> 
> > Yeah looks like I read it wrong - the failure occurred during the
> > initial resource creation phase, based on comparing the logs that Artur
> > posted.
> >
> 
> I see. I got confused by this:
> 
> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/logs/testr_results.html.gz
> 
> Look at the timestamp, it's from 2 days ago.

No, that's correct. It's just taken me 2 days to get around to writing an e-mail
about all this :)

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Armando M.
On 25 November 2015 at 09:49, Sean M. Collins  wrote:

> Yeah looks like I read it wrong - the failure occurred during the
> initial resource creation phase, based on comparing the logs that Artur
> posted.
>

I see. I got confused by this:

http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/logs/testr_results.html.gz

Look at the timestamp, it's from 2 days ago.

These must be files that are stale from runs that happened before this node
got reused? That's rude...


> --
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Sean M. Collins
On Wed, Nov 25, 2015 at 12:29:49PM EST, Korzeniewski, Artur wrote:
> I have run the multimode grenade job twice:
> Failed: [1] 
> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/
> Success: [2] 
> http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/
> 
> The [1] failed because it couldn't ssh to VM. The connectivity issue happened 
> during resource creation phase - even before upgrade.
> 

Ah - now that I've got a log of a successful run, I can see that my
run had the same issue as your failed run - we didn't even get to the
upgrade phase - we failed on the initial resource creation phase, since
my new/ directories were totally empty except for localrc.
-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Sean M. Collins
Yeah looks like I read it wrong - the failure occurred during the
initial resource creation phase, based on comparing the logs that Artur
posted.
-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Korzeniewski, Artur
From what I understand – the Sean case did not get to upgrade [1]– it has ended 
after creating the VM and trying to ping/ssh it.
So we do not have restart logs of q-agt and no logs in new [2]

The subnode will have only ‘old’ logs because the grenade multimode scenario 
assumes that subnode will not be upgraded – that’s why we will be able to test 
RPC versioning new server talking to old compute/q-agt version.

[1] 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.summary.txt.gz
[2] 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/new/

For sanity I’m attaching my runs of multimode grenade job:

Failed the same way Sean’s one: 
http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/

Success:  
http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/

Regards,
Artur

From: Armando M. [mailto:arma...@gmail.com]
Sent: Wednesday, November 25, 2015 6:30 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial 
upgrade



On 25 November 2015 at 08:42, Sean M. Collins 
mailto:s...@coreitpro.com>> wrote:
The first run for the multinode grenade job completed.

http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/

I'm still getting my bearings in the grenade log output, but if I
understand it correctly, after upgrading Neutron, when spawning a new
instance we do not get successful connectivity. The odd part is we see
the failure twice. Once when upgrading Nova[1], then once when upgrading
Cinder[2]. I would have thought Grenade would have exited after just
failing the first time. It did do a call to worlddump.

The odd part is that the first failure is a little strange - it pings
then sleeps until it successfully gets a packet back. Which it does -
but then it does a call to worlddump. Odd?

Anyway, then it goes on to upgrade cinder, then tries to create an
instance and ssh in, then it fails and grenade exits completely.

I'll continue digging through the logs to see exactly why we don't get
connectivity. I see some interesting warnings in the L3 agent log about
cleaning up a non-existent router[3], which may be the culprit.

[1]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_20_34_06_742

[2]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_21_45_15_133

[3]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/old/screen-q-l3.txt.gz?level=WARNING

Good stuff Sean.

A question:

if the workflow is: upgrade server first, run some connectivity tests to then 
proceed to upgrading the rest to then re-run tempest, where's the log of the 
new server?

All I can see is this one:

http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/old/screen-q-svc.txt.gz

And I see no restart in it.

On the subnode (compute), I only see 'old' logs.

http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/subnode-2/old/

Thoughts?


--
Sean M. Collins


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Korzeniewski, Artur
I have run the multimode grenade job twice:
Failed: [1] 
http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/bf6bae1/
Success: [2] 
http://logs.openstack.org/69/143169/60/experimental/gate-grenade-dsvm-neutron-multinode/7c05ff0/

The [1] failed because it couldn't ssh to VM. The connectivity issue happened 
during resource creation phase - even before upgrade.

Regards,
Artur

-Original Message-
From: Sean M. Collins [mailto:s...@coreitpro.com] 
Sent: Wednesday, November 25, 2015 5:42 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial 
upgrade

The first run for the multinode grenade job completed.

http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/

I'm still getting my bearings in the grenade log output, but if I understand it 
correctly, after upgrading Neutron, when spawning a new instance we do not get 
successful connectivity. The odd part is we see the failure twice. Once when 
upgrading Nova[1], then once when upgrading Cinder[2]. I would have thought 
Grenade would have exited after just failing the first time. It did do a call 
to worlddump.

The odd part is that the first failure is a little strange - it pings then 
sleeps until it successfully gets a packet back. Which it does - but then it 
does a call to worlddump. Odd?

Anyway, then it goes on to upgrade cinder, then tries to create an instance and 
ssh in, then it fails and grenade exits completely.

I'll continue digging through the logs to see exactly why we don't get 
connectivity. I see some interesting warnings in the L3 agent log about 
cleaning up a non-existent router[3], which may be the culprit.

[1]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_20_34_06_742

[2]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_21_45_15_133

[3]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/old/screen-q-l3.txt.gz?level=WARNING
--
Sean M. Collins


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Armando M.
On 25 November 2015 at 08:42, Sean M. Collins  wrote:

> The first run for the multinode grenade job completed.
>
>
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/
>
> I'm still getting my bearings in the grenade log output, but if I
> understand it correctly, after upgrading Neutron, when spawning a new
> instance we do not get successful connectivity. The odd part is we see
> the failure twice. Once when upgrading Nova[1], then once when upgrading
> Cinder[2]. I would have thought Grenade would have exited after just
> failing the first time. It did do a call to worlddump.
>
> The odd part is that the first failure is a little strange - it pings
> then sleeps until it successfully gets a packet back. Which it does -
> but then it does a call to worlddump. Odd?
>
> Anyway, then it goes on to upgrade cinder, then tries to create an
> instance and ssh in, then it fails and grenade exits completely.
>
> I'll continue digging through the logs to see exactly why we don't get
> connectivity. I see some interesting warnings in the L3 agent log about
> cleaning up a non-existent router[3], which may be the culprit.
>
> [1]:
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_20_34_06_742
>
> [2]:
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_21_45_15_133
>
> [3]:
> http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/old/screen-q-l3.txt.gz?level=WARNING


Good stuff Sean.

A question:

if the workflow is: upgrade server first, run some connectivity tests to
then proceed to upgrading the rest to then re-run tempest, where's the log
of the new server?

All I can see is this one:

http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/old/screen-q-svc.txt.gz

And I see no restart in it.

On the subnode (compute), I only see 'old' logs.

http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/subnode-2/old/

Thoughts?


>
> --
> Sean M. Collins
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-25 Thread Sean M. Collins
The first run for the multinode grenade job completed.

http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/

I'm still getting my bearings in the grenade log output, but if I
understand it correctly, after upgrading Neutron, when spawning a new
instance we do not get successful connectivity. The odd part is we see
the failure twice. Once when upgrading Nova[1], then once when upgrading
Cinder[2]. I would have thought Grenade would have exited after just
failing the first time. It did do a call to worlddump.

The odd part is that the first failure is a little strange - it pings
then sleeps until it successfully gets a packet back. Which it does -
but then it does a call to worlddump. Odd?

Anyway, then it goes on to upgrade cinder, then tries to create an
instance and ssh in, then it fails and grenade exits completely.

I'll continue digging through the logs to see exactly why we don't get
connectivity. I see some interesting warnings in the L3 agent log about
cleaning up a non-existent router[3], which may be the culprit.

[1]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_20_34_06_742

[2]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/grenade.sh.txt.gz#_2015-11-23_21_45_15_133

[3]: 
http://logs.openstack.org/35/187235/11/experimental/gate-grenade-dsvm-neutron-multinode/011124b/logs/old/screen-q-l3.txt.gz?level=WARNING
-- 
Sean M. Collins


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-16 Thread Sean M. Collins
Chatted with Sean D. on IRC, pushed up a patch 

https://review.openstack.org/#/c/245862/

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-16 Thread Sean Dague
On 11/16/2015 06:57 AM, Korzeniewski, Artur wrote:
> Thanks Sean D. for explanation!
> 
>  
> 
> I’ve taken a look into old Russell patches, and it seems that the
> project-config was already modified by him:
> 
> Add check-grenade-dsvm-partial-ncpu-neutron: (project-config)
> 
> https://review.openstack.org/#/c/189426
> 
> Add check-grenade-dsvm-partial-ncpu-neutron-dvr (project-config)
> 
> https://review.openstack.org/#/c/189727

These are not the project-config changes you want. The old partial
method is deprecated, instead you should be using multinode + grenade
(per the conversation at the top of this thread).

> Another 2 patches are introducing the Neutron partial job to devstack-gate
> 
> Add partial-ncpu-neutron grenade mode (devstack-gate)
> 
> https://review.openstack.org/#/c/189424/
> 
> Add partial-ncpu-neutron-dvr grenade mode (devstack-gate)
> 
> https://review.openstack.org/#/c/189715

Again, you don't want these patches. These are the wrong direction.

> I haven’t tested that yet, but it looks like it does the job.
> 
>  
> 
> Also, there is still one patch in Devstack needed for L3 agent separate
> start/stop:
> 
> Separate start/stop control of the Neutron L3 agent. (Devstack)
> 
> https://review.openstack.org/#/c/189710/

No, we don't need that patch.

> 
>  
> 
> From what Sean D. talked about, following patches should not be resurrected:
> 
> Support partial upgrades of Neutron in DVR mode: (Grenade)
> 
> https://review.openstack.org/#/c/189712
> 
>  Support partial Neutron upgrades. (Grenade)
> 
> https://review.openstack.org/#/c/189417/

No, you don't want those either.

> In order to test the RPC right, we should be able to decouple the
> neutron server from its agents – L2, L3, DHCP and metadata agents.
> 
> Current scenario will let us to test :
> 
> 1.   Legacy:
> 
> a.   Controller & network node: neutron server, L2, L3, Metadata and
> DHCP agents
> 
> b.  Compute node: L2 agent.
> 
> 2.   DVR:
> 
> a.   Controller & network node: neutron server, L2, L3, Metadata and
> DHCP agents
> 
> b.  Compute node: L2, L3, Metadata(?) agents
> 
>  
> 
> We can start with current scenario, but this does not guarantee us to
> test of DHCP RPC.
> 
>  
> 
> The ideal upgrade scenario should look like this:
> 
> 1.   Legacy:
> 
> a.   Controller node: neutron server
> 
> b.  Network node: L2, L3, Metadata and DHCP server
> 
> c.   Compute node: L2 agent
> 
> 2.   DVR:
> 
> a.   Controller node: neutron server
> 
> b.  Network node: L2, L3, Metadata and DHCP server
> 
> c.   Compute node: L2, L3 and Metadata agent
> 
>  
> 
> The job still to be done in order to fully test partial upgrades:
> 
> -  Decouple the DHCP and metadata agent from devstack neutron
> restart
> 
> -  Look through the grenade Neutron code in order to identify if
> we are creating the all the resources critical to test the upgrades
> 
> -  Debug, debug, debug…
> 
>  
> 
> Regards,
> 
> Artur
> 
> *From:*Armando M. [mailto:arma...@gmail.com]
> *Sent:* Friday, November 13, 2015 9:37 PM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova][neutron][upgrade] Grenade
> multinode partial upgrade
> 
>  
> 
>  
> 
>  
> 
> On 13 November 2015 at 11:46, Sean Dague  <mailto:s...@dague.net>> wrote:
> 
> On 11/13/2015 01:16 PM, Sean M. Collins wrote:
> > On Fri, Nov 13, 2015 at 07:42:12AM EST, Sean Dague wrote:
> >> Ok, I top responded with the details of the job, honestly I think
> it's
> >> just a project-config change to get up and running, and then
> hacking at
> >> the bugs that fall out.
> >
> > Thanks - that was super helpful.
> >
> > I'm thinking of working on the following on Monday:
> >
> > 1) capture that somewhere in the upgrade docs we're putting
> together in neutron's devref
> >
> > 2) Adding the stanza to project-config to get grenade running for
> > Neutron
> >
> > 3) Take a look at the patches that Armando linked a couple emails back
> > in this thread.
> 
> I don't think that any of the patches listed there are needed. This was
> part of the reason I -2ed that direction in the last cycle. It required
> a separate special code path for partial upgrade setup which was very
> synthetic (and honestly kind of confusing to debug).
> 
>  
> 

Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-16 Thread Korzeniewski, Artur
Thanks Sean D. for explanation!

I’ve taken a look into old Russell patches, and it seems that the 
project-config was already modified by him:

Add check-grenade-dsvm-partial-ncpu-neutron: (project-config)

https://review.openstack.org/#/c/189426

Add check-grenade-dsvm-partial-ncpu-neutron-dvr (project-config)

https://review.openstack.org/#/c/189727



Another 2 patches are introducing the Neutron partial job to devstack-gate

Add partial-ncpu-neutron grenade mode (devstack-gate)

https://review.openstack.org/#/c/189424/

Add partial-ncpu-neutron-dvr grenade mode (devstack-gate)

https://review.openstack.org/#/c/189715

I haven’t tested that yet, but it looks like it does the job.

Also, there is still one patch in Devstack needed for L3 agent separate 
start/stop:

Separate start/stop control of the Neutron L3 agent. (Devstack)

https://review.openstack.org/#/c/189710/

From what Sean D. talked about, following patches should not be resurrected:

Support partial upgrades of Neutron in DVR mode: (Grenade)

https://review.openstack.org/#/c/189712

 Support partial Neutron upgrades. (Grenade)

https://review.openstack.org/#/c/189417/

In order to test the RPC right, we should be able to decouple the neutron 
server from its agents – L2, L3, DHCP and metadata agents.
Current scenario will let us to test :

1.   Legacy:

a.   Controller & network node: neutron server, L2, L3, Metadata and DHCP 
agents

b.  Compute node: L2 agent.

2.   DVR:

a.   Controller & network node: neutron server, L2, L3, Metadata and DHCP 
agents

b.  Compute node: L2, L3, Metadata(?) agents

We can start with current scenario, but this does not guarantee us to test of 
DHCP RPC.

The ideal upgrade scenario should look like this:

1.   Legacy:

a.   Controller node: neutron server

b.  Network node: L2, L3, Metadata and DHCP server

c.   Compute node: L2 agent

2.   DVR:

a.   Controller node: neutron server

b.  Network node: L2, L3, Metadata and DHCP server

c.   Compute node: L2, L3 and Metadata agent

The job still to be done in order to fully test partial upgrades:

-  Decouple the DHCP and metadata agent from devstack neutron restart

-  Look through the grenade Neutron code in order to identify if we are 
creating the all the resources critical to test the upgrades

-  Debug, debug, debug…

Regards,
Artur
From: Armando M. [mailto:arma...@gmail.com]
Sent: Friday, November 13, 2015 9:37 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial 
upgrade



On 13 November 2015 at 11:46, Sean Dague 
mailto:s...@dague.net>> wrote:
On 11/13/2015 01:16 PM, Sean M. Collins wrote:
> On Fri, Nov 13, 2015 at 07:42:12AM EST, Sean Dague wrote:
>> Ok, I top responded with the details of the job, honestly I think it's
>> just a project-config change to get up and running, and then hacking at
>> the bugs that fall out.
>
> Thanks - that was super helpful.
>
> I'm thinking of working on the following on Monday:
>
> 1) capture that somewhere in the upgrade docs we're putting together in 
> neutron's devref
>
> 2) Adding the stanza to project-config to get grenade running for
> Neutron
>
> 3) Take a look at the patches that Armando linked a couple emails back
> in this thread.

I don't think that any of the patches listed there are needed. This was
part of the reason I -2ed that direction in the last cycle. It required
a separate special code path for partial upgrade setup which was very
synthetic (and honestly kind of confusing to debug).

I don't disagree. I didn't meant to imply 'resume the patches', I was only 
providing the backdrop.


The new approach means if you did upgrade for the all-in-one case, and
you did multinode setup with worker processes on the subnode, you just
make a config where you do them both at the same time, and you have
partial upgrade.

-Sean

--
Sean Dague
http://dague.net
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-13 Thread Armando M.
On 13 November 2015 at 11:46, Sean Dague  wrote:

> On 11/13/2015 01:16 PM, Sean M. Collins wrote:
> > On Fri, Nov 13, 2015 at 07:42:12AM EST, Sean Dague wrote:
> >> Ok, I top responded with the details of the job, honestly I think it's
> >> just a project-config change to get up and running, and then hacking at
> >> the bugs that fall out.
> >
> > Thanks - that was super helpful.
> >
> > I'm thinking of working on the following on Monday:
> >
> > 1) capture that somewhere in the upgrade docs we're putting together in
> neutron's devref
> >
> > 2) Adding the stanza to project-config to get grenade running for
> > Neutron
> >
> > 3) Take a look at the patches that Armando linked a couple emails back
> > in this thread.
>
> I don't think that any of the patches listed there are needed. This was
> part of the reason I -2ed that direction in the last cycle. It required
> a separate special code path for partial upgrade setup which was very
> synthetic (and honestly kind of confusing to debug).
>

I don't disagree. I didn't meant to imply 'resume the patches', I was only
providing the backdrop.


>
> The new approach means if you did upgrade for the all-in-one case, and
> you did multinode setup with worker processes on the subnode, you just
> make a config where you do them both at the same time, and you have
> partial upgrade.
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-13 Thread Sean Dague
On 11/13/2015 01:16 PM, Sean M. Collins wrote:
> On Fri, Nov 13, 2015 at 07:42:12AM EST, Sean Dague wrote:
>> Ok, I top responded with the details of the job, honestly I think it's
>> just a project-config change to get up and running, and then hacking at
>> the bugs that fall out.
> 
> Thanks - that was super helpful. 
> 
> I'm thinking of working on the following on Monday:
> 
> 1) capture that somewhere in the upgrade docs we're putting together in 
> neutron's devref
> 
> 2) Adding the stanza to project-config to get grenade running for
> Neutron
> 
> 3) Take a look at the patches that Armando linked a couple emails back
> in this thread.

I don't think that any of the patches listed there are needed. This was
part of the reason I -2ed that direction in the last cycle. It required
a separate special code path for partial upgrade setup which was very
synthetic (and honestly kind of confusing to debug).

The new approach means if you did upgrade for the all-in-one case, and
you did multinode setup with worker processes on the subnode, you just
make a config where you do them both at the same time, and you have
partial upgrade.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-13 Thread Sean M. Collins
On Fri, Nov 13, 2015 at 07:42:12AM EST, Sean Dague wrote:
> Ok, I top responded with the details of the job, honestly I think it's
> just a project-config change to get up and running, and then hacking at
> the bugs that fall out.

Thanks - that was super helpful. 

I'm thinking of working on the following on Monday:

1) capture that somewhere in the upgrade docs we're putting together in 
neutron's devref

2) Adding the stanza to project-config to get grenade running for
Neutron

3) Take a look at the patches that Armando linked a couple emails back
in this thread.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-13 Thread Ihar Hrachyshka

Sean Dague  wrote:


On 11/13/2015 04:08 AM, Ihar Hrachyshka wrote:

Armando M.  wrote:


On 12 November 2015 at 20:24, Sean M. Collins  wrote:
On Thu, Nov 12, 2015 at 05:55:51PM EST, Ihar Hrachyshka wrote:

I also believe that the first step to get the job set is making

neutron own

its grenade future, by migrating to grenade plugin maintained in

neutron

tree.


I'd like to see what Sean Dague thinks of this - my worry is that if we
start pulling things into Neutron we lose valuable insight from people
who know a lot about Grenade.

Not to mention, Sean and I have had conversations about trying to get
Neutron as the default for DevStack - we can't just take our ball and go
in our own corner.

Agreed. (I feel like) we had a good discussion at the summit about
this: we clearly have key pieces that are and will stay within the
realm of both devstack and grenade.


Agreed that it’s worth clarifying with grenade folks what should be
included in grenade plugin, and what belongs to core grenade; and where
multinode ‘partial’ job stands in this regard.


Ok, I top responded with the details of the job, honestly I think it's
just a project-config change to get up and running, and then hacking at
the bugs that fall out.

Much like with devstack, I think that neutron core service configuration
/ upgrade should stay in the tree. We want more people familiar with it,
and I really do want to get us over to Neutron by default some day
(hopefully not too far off). I'm quite hopefully of the work Sean
Collins is doing there.

The advanced services should all be in plugins. I think we fully removed
them from base grenade testing last cycle.

There are lots of coupling in the set of projects that need to cooperate
to get computes on the network, so debugging and fixing those issues
often depends on understanding the whole collection. Having the code
that does that and the ability to bugfix in one place means we can turn
around breaks faster. It also means that everyone in the ecosystem
becomes familiar with all of that by default.


Cool, thanks for clarification. I was under impression that all projects  
should adopt plugins, not just those considered not part of core. Also glad  
to hear it’s a matter of infra patch for a new job. I think we’ll roll it  
from here.


Thanks
Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-13 Thread Sean Dague
On 11/13/2015 04:08 AM, Ihar Hrachyshka wrote:
> Armando M.  wrote:
> 
>>
>>
>> On 12 November 2015 at 20:24, Sean M. Collins  wrote:
>> On Thu, Nov 12, 2015 at 05:55:51PM EST, Ihar Hrachyshka wrote:
>> > I also believe that the first step to get the job set is making
>> neutron own
>> > its grenade future, by migrating to grenade plugin maintained in
>> neutron
>> > tree.
>>
>> I'd like to see what Sean Dague thinks of this - my worry is that if we
>> start pulling things into Neutron we lose valuable insight from people
>> who know a lot about Grenade.
>>
>> Not to mention, Sean and I have had conversations about trying to get
>> Neutron as the default for DevStack - we can't just take our ball and go
>> in our own corner.
>>
>> Agreed. (I feel like) we had a good discussion at the summit about
>> this: we clearly have key pieces that are and will stay within the
>> realm of both devstack and grenade.
> 
> Agreed that it’s worth clarifying with grenade folks what should be
> included in grenade plugin, and what belongs to core grenade; and where
> multinode ‘partial’ job stands in this regard.

Ok, I top responded with the details of the job, honestly I think it's
just a project-config change to get up and running, and then hacking at
the bugs that fall out.

Much like with devstack, I think that neutron core service configuration
/ upgrade should stay in the tree. We want more people familiar with it,
and I really do want to get us over to Neutron by default some day
(hopefully not too far off). I'm quite hopefully of the work Sean
Collins is doing there.

The advanced services should all be in plugins. I think we fully removed
them from base grenade testing last cycle.

There are lots of coupling in the set of projects that need to cooperate
to get computes on the network, so debugging and fixing those issues
often depends on understanding the whole collection. Having the code
that does that and the ability to bugfix in one place means we can turn
around breaks faster. It also means that everyone in the ecosystem
becomes familiar with all of that by default.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-13 Thread Sean Dague
On 11/12/2015 02:41 PM, Korzeniewski, Artur wrote:
> Hi Sean,
> 
> I’m interested in introducing to Neutron the multinode partial upgrade
> job in Grenade.
> 
>  
> 
> Can you explain how multinode is currently working in Grenade and how
> Nova is doing the partial upgrade?

We're hopefully a couple of days out from making this voting. Here is
how it works conceptually:

Grenade itself knows how to upgrade 1 node. For simplicity sake we've
left it that way. Putting a real orchestration layer into Grenade would
be... potentially challenging. However Grenade implicitly runs stack.sh
today to make developer's life easier.

Devstack-gate knows how to setup a 2 nodes, and has the rest of the
multinode logic. Under a grenade multinode environment we:

* Allocate 2 nodes

* Setup all the source trees and configs correctly on all the nodes
(which includes less services on the workers) -
https://github.com/openstack-infra/devstack-gate/blob/92d130938406e4b42cdb1fe3e6fa62f3a2466024/devstack-vm-gate.sh#L206-L222

* Then we give Grenade a post-stack.sh script which includes the logic
to run stack.sh on all the subnodes at the right time -
https://github.com/openstack-infra/devstack-gate/blob/92d130938406e4b42cdb1fe3e6fa62f3a2466024/devstack-vm-gate.sh#L587-L597

* We run grenade

* It runs stack.sh on the main node, runs stack.sh on the subnode
because of post-stack.sh, then proceeds to run:

   - tempest smoke
   - creating of long running resources
   - shutsdown the controller / upgrades / restarts
   - verifies the long running resources are still there
   - runs tempest smoke
   - success

It completely ignores the subnode after post-stack.sh, which means that
will continue soldiering on as the stable version.

That means support for a new partial upgrade scenario is really only 2
things:

1. generic multinode support for the collection of services you want
(i.e. a definition of what's in the subnode).
2. support in grenade (or via a plugin) for upgrading the entire
controller node.

Fortunately, I believe Neutron already has #1 and #2 because of existing
jobs, so getting a multinode grenade job should just be a matter of a
project-config stanza. No additional code needs to be written. I'm sure
there might be some bugs (there always are), but getting rolling
shouldn't be too bad.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-13 Thread Ihar Hrachyshka

Armando M.  wrote:




On 12 November 2015 at 20:24, Sean M. Collins  wrote:
On Thu, Nov 12, 2015 at 05:55:51PM EST, Ihar Hrachyshka wrote:
> I also believe that the first step to get the job set is making neutron  
own

> its grenade future, by migrating to grenade plugin maintained in neutron
> tree.

I'd like to see what Sean Dague thinks of this - my worry is that if we
start pulling things into Neutron we lose valuable insight from people
who know a lot about Grenade.

Not to mention, Sean and I have had conversations about trying to get
Neutron as the default for DevStack - we can't just take our ball and go
in our own corner.

Agreed. (I feel like) we had a good discussion at the summit about this:  
we clearly have key pieces that are and will stay within the realm of  
both devstack and grenade.


Agreed that it’s worth clarifying with grenade folks what should be  
included in grenade plugin, and what belongs to core grenade; and where  
multinode ‘partial’ job stands in this regard.


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-12 Thread Armando M.
On 12 November 2015 at 20:24, Sean M. Collins  wrote:

> On Thu, Nov 12, 2015 at 05:55:51PM EST, Ihar Hrachyshka wrote:
> > I also believe that the first step to get the job set is making neutron
> own
> > its grenade future, by migrating to grenade plugin maintained in neutron
> > tree.
>
> I'd like to see what Sean Dague thinks of this - my worry is that if we
> start pulling things into Neutron we lose valuable insight from people
> who know a lot about Grenade.


> Not to mention, Sean and I have had conversations about trying to get
> Neutron as the default for DevStack - we can't just take our ball and go
> in our own corner.
>

Agreed. (I feel like) we had a good discussion at the summit about this: we
clearly have key pieces that are and will stay within the realm of both
devstack and grenade.


>
> --
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-12 Thread Sean M. Collins
On Thu, Nov 12, 2015 at 05:55:51PM EST, Ihar Hrachyshka wrote:
> I also believe that the first step to get the job set is making neutron own
> its grenade future, by migrating to grenade plugin maintained in neutron
> tree.

I'd like to see what Sean Dague thinks of this - my worry is that if we
start pulling things into Neutron we lose valuable insight from people
who know a lot about Grenade.

Not to mention, Sean and I have had conversations about trying to get
Neutron as the default for DevStack - we can't just take our ball and go
in our own corner.

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-12 Thread Ihar Hrachyshka

Artur  wrote:


Hi Sean,
I’m interested in introducing to Neutron the multinode partial upgrade  
job in Grenade.


Can you explain how multinode is currently working in Grenade and how  
Nova is doing the partial upgrade?




Let’s work on this in the upgrade subteam first, and reach out to folks  
only if we are not clear about some specifics. Sean Collins already  
expressed his interest in it, so we should make sure you sync on the  
feature.


I also believe that the first step to get the job set is making neutron own  
its grenade future, by migrating to grenade plugin maintained in neutron  
tree. We were probably not clear about that before, hence your immediate  
interest in ‘partial’ details.


Let’s sync tomorrow in irc on the matter.

Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

2015-11-12 Thread Armando M.
On 12 November 2015 at 11:41, Korzeniewski, Artur <
artur.korzeniew...@intel.com> wrote:

> Hi Sean,
>
> I’m interested in introducing to Neutron the multinode partial upgrade job
> in Grenade.
>

Great to hear!


>
>
> Can you explain how multinode is currently working in Grenade and how Nova
> is doing the partial upgrade?
>

sc68cal and garyk may also be good contacts to reach to get help on this
(well overdue) initiative, there were a number of patches that needed some
love, started by russellb. Did you by any chance look into them?

https://review.openstack.org/#/c/220649/
https://review.openstack.org/#/c/189710/
https://review.openstack.org/#/c/189417/
https://review.openstack.org/#/c/189712/

I hope I have not forgotten any.

Cheers,
Armando



>
>
> Regards,
>
> Artur Korzeniewski
>
> IRC: korzen
>
> 
>
> Intel Technology Poland sp. z o.o.
>
> KRS 101882
>
> ul. Slowackiego 173, 80-298 Gdansk
>
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev