Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-11-01 Thread Derek Higgins
On Wed, 31 Oct 2018 at 17:22, Alex Schultz  wrote:
>
> Hey everyone,
>
> Based on previous emails around this[0][1], I have proposed a possible
> reducing in our usage by switching the scenario001--011 jobs to
> non-voting and removing them from the gate[2]. This will reduce the
> likelihood of causing gate resets and hopefully allow us to land
> corrective patches sooner.  In terms of risks, there is a risk that we
> might introduce breaking changes in the scenarios because they are
> officially non-voting, and we will still be gating promotions on these
> scenarios.  This means that if they are broken, they will need the
> same attention and care to fix them so we should be vigilant when the
> jobs are failing.
>
> The hope is that we can switch these scenarios out with voting
> standalone versions in the next few weeks, but until that I think we
> should proceed by removing them from the gate.  I know this is less
> than ideal but as most failures with these jobs in the gate are either
> timeouts or unrelated to the changes (or gate queue), they are more of
> hindrance than a help at this point.
>
> Thanks,
> -Alex

While on the topic of reducing the CI footprint

something worth considering when pushing up a string of patches would
be to remove a bunch of the check jobs at the start of the patch set.

e.g. If I'm working on t-h-t and have a series of 10 patches, while
looking for feedback I could remove most of the jobs from
zuul.d/layout.yaml in patch 1 so all 10 patches don't run the entire
suite of CI jobs. Once it becomes clear that the patchset is nearly
ready to merge, I change patch 1 leave zuul.d/layout.yaml as is.

I'm not suggesting everybody does this but anybody who tends to push
up multiple patch sets together could consider it to not tie up
resources for hours.

>
> [0] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> [1] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> [2] 
> https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] status of the zuulv3 job migration

2018-09-24 Thread Derek Higgins
On Fri, 21 Sep 2018 at 11:16, Derek Higgins  wrote:

> Just quick summary of the status and looking for some input about the
> experimental jobs
>
> 15 jobs are now done, with another 2 ready for reviewing. This leaves 6
> jobs
> 1 x multinode job
>I've yet to finished porting this one
> 2 x grenade jobs
>Last time I looked grenade jobs couldn't yet be ported to zuulv3 native
> but I'll investigate further
>
> 3 x experimental jobs
> (ironic-dsvm-functional, ironic-tempest-dsvm-parallel, 
> ironic-tempest-dsvm-pxe_ipa-full)
> These don't currently pass and it doesn't look like anybody is using
> them, So I'd like to know if there is anybody out there interested in them,
> if not I'll go ahead and remove them.
>
Job removal proposed here https://review.openstack.org/#/c/591675/

>
> thanks,
> Derek.
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ironic] status of the zuulv3 job migration

2018-09-21 Thread Derek Higgins
Just quick summary of the status and looking for some input about the
experimental jobs

15 jobs are now done, with another 2 ready for reviewing. This leaves 6
jobs
1 x multinode job
   I've yet to finished porting this one
2 x grenade jobs
   Last time I looked grenade jobs couldn't yet be ported to zuulv3 native
but I'll investigate further

3 x experimental jobs
(ironic-dsvm-functional, ironic-tempest-dsvm-parallel,
ironic-tempest-dsvm-pxe_ipa-full)
These don't currently pass and it doesn't look like anybody is using
them, So I'd like to know if there is anybody out there interested in them,
if not I'll go ahead and remove them.

thanks,
Derek.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo]Testing ironic in the overcloud

2018-06-28 Thread Derek Higgins
On 23 February 2018 at 14:48, Derek Higgins  wrote:

>
>
> On 1 February 2018 at 16:18, Emilien Macchi  wrote:
>
>> On Thu, Feb 1, 2018 at 8:05 AM, Derek Higgins  wrote:
>> [...]
>>
>>> o Should I create a new tempest test for baremetal as some of the
>>>>> networking stuff is different?
>>>>>
>>>>
>>>> I think we would need to run baremetal tests for this new featureset,
>>>> see existing files for examples.
>>>>
>>> Do you mean that we should use existing tests somewhere or create new
>>> ones?
>>>
>>
>> I mean we should use existing tempest tests from ironic, etc. Maybe just
>> a baremetal scenario that spawn a baremetal server and test ssh into it,
>> like we already have with other jobs.
>>
> Done, the current set of patches sets up a new non voting job
> "tripleo-ci-centos-7-scenario011-multinode-oooq-container" which setup up
> ironic in the overcloud and run the ironic tempest job
> "ironic_tempest_plugin.tests.scenario.test_baremetal_basic_
> ops.BaremetalBasicOps.test_baremetal_server_ops"
>
> its currently passing so I'd appreciate a few eyes on it before it becomes
> out of date again
> there are 4 patches starting here https://review.openstack.
> org/#/c/509728/19
>

This is now working again so If anybody has the time I'd appreciate some
reviews while its still current
See scenario011 on https://review.openstack.org/#/c/509728/




>
>
>>
>> o Is running a script on the controller with NodeExtraConfigPost the best
>>>>> way to set this up or should I be doing something with quickstart? I don't
>>>>> think quickstart currently runs things on the controler does it?
>>>>>
>>>>
>>>> What kind of thing do you want to run exactly?
>>>>
>>> The contents to this file will give you an idea, somewhere I need to
>>> setup a node that ironic will control with ipmi
>>> https://review.openstack.org/#/c/485261/19/ci/common/vbmc_setup.yaml
>>>
>>
>> extraconfig works for me in that case, I guess. Since we don't productize
>> this code and it's for CI only, it can live here imho.
>>
>> Thanks,
>> --
>> Emilien Macchi
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Ironic Inspector in the overcloud

2018-04-20 Thread Derek Higgins
On 18 April 2018 at 17:12, Derek Higgins <der...@redhat.com> wrote:

>
>
> On 18 April 2018 at 14:22, Bogdan Dobrelya <bdobr...@redhat.com> wrote:
>
>> On 4/18/18 12:07 PM, Derek Higgins wrote:
>>
>>> Hi All,
>>>
>>> I've been testing the ironic inspector containerised service in the
>>> overcloud, the service essentially works but there is a couple of hurdles
>>> to tackle to set it up, the first of these is how to get  the IPA kernel
>>> and ramdisk where they need to be.
>>>
>>> These need to be be present in the ironic_pxe_http container to be
>>> served out over http, whats the best way to get them there?
>>>
>>> On the undercloud this is done by copying the files across the
>>> filesystem[1][2] to /httpboot  when we run "openstack overcloud image
>>> upload", but on the overcloud an alternative is required, could the files
>>> be pulled into the container during setup?
>>>
>>
>> I'd prefer keep bind-mounting IPA kernel and ramdisk into a container via
>> the /var/lib/ironic/httpboot host-path. So the question then becomes how to
>> deliver those by that path for overcloud nodes?
>>
> Yup it does, I'm currently looking into using DeployArtifactURLs to
> download the files to the controller nodes
>
It turns out this wont work as Deploy artifacts downloads to all hosts
which we don't want,

I'm instead going to propose we add a docker config to download the files
over http, by default it will use the same images that were used by the
undercloud
https://review.openstack.org/#/c/563072/1



>
>
>>
>>
>>> thanks,
>>> Derek
>>>
>>> 1 - https://github.com/openstack/python-tripleoclient/blob/3cf44
>>> eb/tripleoclient/v1/overcloud_image.py#L421-L433
>>> 2 - https://github.com/openstack/python-tripleoclient/blob/3cf44
>>> eb/tripleoclient/v1/overcloud_image.py#L181
>>>
>>> 
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>> --
>> Best regards,
>> Bogdan Dobrelya,
>> Irc #bogdando
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Ironic Inspector in the overcloud

2018-04-18 Thread Derek Higgins
On 18 April 2018 at 14:22, Bogdan Dobrelya <bdobr...@redhat.com> wrote:

> On 4/18/18 12:07 PM, Derek Higgins wrote:
>
>> Hi All,
>>
>> I've been testing the ironic inspector containerised service in the
>> overcloud, the service essentially works but there is a couple of hurdles
>> to tackle to set it up, the first of these is how to get  the IPA kernel
>> and ramdisk where they need to be.
>>
>> These need to be be present in the ironic_pxe_http container to be served
>> out over http, whats the best way to get them there?
>>
>> On the undercloud this is done by copying the files across the
>> filesystem[1][2] to /httpboot  when we run "openstack overcloud image
>> upload", but on the overcloud an alternative is required, could the files
>> be pulled into the container during setup?
>>
>
> I'd prefer keep bind-mounting IPA kernel and ramdisk into a container via
> the /var/lib/ironic/httpboot host-path. So the question then becomes how to
> deliver those by that path for overcloud nodes?
>
Yup it does, I'm currently looking into using DeployArtifactURLs to
download the files to the controller nodes


>
>
>> thanks,
>> Derek
>>
>> 1 - https://github.com/openstack/python-tripleoclient/blob/3cf44
>> eb/tripleoclient/v1/overcloud_image.py#L421-L433
>> 2 - https://github.com/openstack/python-tripleoclient/blob/3cf44
>> eb/tripleoclient/v1/overcloud_image.py#L181
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Ironic Inspector in the overcloud

2018-04-18 Thread Derek Higgins
Hi All,

I've been testing the ironic inspector containerised service in the
overcloud, the service essentially works but there is a couple of hurdles
to tackle to set it up, the first of these is how to get  the IPA kernel
and ramdisk where they need to be.

These need to be be present in the ironic_pxe_http container to be served
out over http, whats the best way to get them there?

On the undercloud this is done by copying the files across the
filesystem[1][2] to /httpboot  when we run "openstack overcloud image
upload", but on the overcloud an alternative is required, could the files
be pulled into the container during setup?

thanks,
Derek

1 -
https://github.com/openstack/python-tripleoclient/blob/3cf44eb/tripleoclient/v1/overcloud_image.py#L421-L433
2 -
https://github.com/openstack/python-tripleoclient/blob/3cf44eb/tripleoclient/v1/overcloud_image.py#L181
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo]Testing ironic in the overcloud

2018-02-23 Thread Derek Higgins
On 1 February 2018 at 16:18, Emilien Macchi <emil...@redhat.com> wrote:

> On Thu, Feb 1, 2018 at 8:05 AM, Derek Higgins <der...@redhat.com> wrote:
> [...]
>
>> o Should I create a new tempest test for baremetal as some of the
>>>> networking stuff is different?
>>>>
>>>
>>> I think we would need to run baremetal tests for this new featureset,
>>> see existing files for examples.
>>>
>> Do you mean that we should use existing tests somewhere or create new
>> ones?
>>
>
> I mean we should use existing tempest tests from ironic, etc. Maybe just a
> baremetal scenario that spawn a baremetal server and test ssh into it, like
> we already have with other jobs.
>
Done, the current set of patches sets up a new non voting job
"tripleo-ci-centos-7-scenario011-multinode-oooq-container" which setup up
ironic in the overcloud and run the ironic tempest job
"ironic_tempest_plugin.tests.scenario.test_baremetal_basic_ops.BaremetalBasicOps.test_baremetal_server_ops"

its currently passing so I'd appreciate a few eyes on it before it becomes
out of date again
there are 4 patches starting here https://review.openstack.org/#/c/509728/19


>
> o Is running a script on the controller with NodeExtraConfigPost the best
>>>> way to set this up or should I be doing something with quickstart? I don't
>>>> think quickstart currently runs things on the controler does it?
>>>>
>>>
>>> What kind of thing do you want to run exactly?
>>>
>> The contents to this file will give you an idea, somewhere I need to
>> setup a node that ironic will control with ipmi
>> https://review.openstack.org/#/c/485261/19/ci/common/vbmc_setup.yaml
>>
>
> extraconfig works for me in that case, I guess. Since we don't productize
> this code and it's for CI only, it can live here imho.
>
> Thanks,
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo]Testing ironic in the overcloud

2018-02-01 Thread Derek Higgins
On 1 February 2018 at 15:36, Emilien Macchi <emil...@redhat.com> wrote:

>
>
> On Thu, Feb 1, 2018 at 6:35 AM, Derek Higgins <der...@redhat.com> wrote:
>
>> Hi All,
>>I've been working on a set of patches as a WIP to test ironic in the
>> overcloud[1], the approach I've started with is to add ironic into the
>> overcloud controller in scenario004. Also to run a script on the controller
>> (as a NodeExtraConfigPost) that sets up a VM with vbmc that can then be
>> controlled by ironic. The WIP currently replaces the current tempest tests
>> with some commands to sanity test the setup. This essentially works but
>> things need to be cleaned up a bit so I've a few questions
>>
>> o Is scenario004 the correct choice?
>>
>
> Because we might increase the timeout risk on scenario004, I would
> recommend to create a new dedicated scenario that would deploy a very basic
> overcloud with just ironic + dependencies (keystone, glance, neutron, and
> nova?)
>

Ok, I can do this



>
>
>>
>> o Should I create a new tempest test for baremetal as some of the
>> networking stuff is different?
>>
>
> I think we would need to run baremetal tests for this new featureset, see
> existing files for examples.
>
Do you mean that we should use existing tests somewhere or create new ones?



>
>
>>
>> o Is running a script on the controller with NodeExtraConfigPost the best
>> way to set this up or should I be doing something with quickstart? I don't
>> think quickstart currently runs things on the controler does it?
>>
>
> What kind of thing do you want to run exactly?
>
The contents to this file will give you an idea, somewhere I need to setup
a node that ironic will control with ipmi
https://review.openstack.org/#/c/485261/19/ci/common/vbmc_setup.yaml


> I'll let the CI squad replies as well but I think we need a new scenario,
> that we would only run when touching ironic files in tripleo. Using
> scenario004 really increase the risk of timeout and we don't want it.
>
Ok




>
> Thanks for this work!
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo]Testing ironic in the overcloud

2018-02-01 Thread Derek Higgins
Hi All,
   I've been working on a set of patches as a WIP to test ironic in the
overcloud[1], the approach I've started with is to add ironic into the
overcloud controller in scenario004. Also to run a script on the controller
(as a NodeExtraConfigPost) that sets up a VM with vbmc that can then be
controlled by ironic. The WIP currently replaces the current tempest tests
with some commands to sanity test the setup. This essentially works but
things need to be cleaned up a bit so I've a few questions

o Is scenario004 the correct choice?

o Should I create a new tempest test for baremetal as some of the
networking stuff is different?

o Is running a script on the controller with NodeExtraConfigPost the best
way to set this up or should I be doing something with quickstart? I don't
think quickstart currently runs things on the controler does it?

thanks,
Derek.

[1] - https://review.openstack.org/#/c/485261
  https://review.openstack.org/#/c/509728/
  https://review.openstack.org/#/c/509829/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Tis the season...for a cloud reboot

2017-12-19 Thread Derek Higgins
On 19 December 2017 at 22:23, Brian Haley  wrote:

> On 12/19/2017 04:00 PM, Ben Nemec wrote:
>
>>
>>
>> On 12/19/2017 02:43 PM, Brian Haley wrote:
>>
>>> On 12/19/2017 11:53 AM, Ben Nemec wrote:
>>>
 The reboot is done (mostly...see below).

 On 12/18/2017 05:11 PM, Joe Talerico wrote:

> Ben - Can you provide some links to the ovs port exhaustion issue for
> some background?
>

 I don't know if we ever had a bug opened, but there's some discussion
 of it in http://lists.openstack.org/pipermail/openstack-dev/2016-Dece
 mber/109182.html   I've also copied Derek since I believe he was the
 one who found it originally.

 The gist is that after about 3 months of tripleo-ci running in this
 cloud we start to hit errors creating instances because of problems
 creating OVS ports on the compute nodes.  Sometimes we see a huge number of
 ports in general, other times we see a lot of ports that look like this:

 Port "qvod2cade14-7c"
  tag: 4095
  Interface "qvod2cade14-7c"

 Notably they all have a tag of 4095, which seems suspicious to me.  I
 don't know whether it's actually an issue though.

>>>
>>> Tag 4095 is for "dead" OVS ports, it's an unused VLAN tag in the agent.
>>>
>>> The 'qvo' here shows it's part of the VETH pair that os-vif created when
>>> it plugged in the VM (the other half is 'qvb'), and they're created so that
>>> iptables rules can be applied by neutron.  It's part of the "old" way to do
>>> security groups with the OVSHybridIptablesFirewallDriver, and can
>>> eventually go away once the OVSFirewallDriver can be used everywhere
>>> (requires newer OVS and agent).
>>>
>>> I wonder if you can run the ovs_cleanup utility to clean some of these
>>> up?
>>>
>>
>> As in neutron-ovs-cleanup?  Doesn't that wipe out everything, including
>> any ports that are still in use?  Or is there a different tool I'm not
>> aware of that can do more targeted cleanup?
>>
>
> Crap, I thought there was an option to just cleanup these dead devices, I
> should have read the code, it's either neutron ports (default) or all
> ports.  Maybe that should be an option.


iirc neutron-ovs-cleanup was being run following the reboot as part of
a ExecStartPre= on one of the neutron services this is what essentially
removed the ports for us.



>
>
> -Brian
>
>
> Oh, also worth noting that I don't think we have os-vif in this cloud
>> because it's so old.  There's no os-vif package installed anyway.
>>
>>
>>> -Brian
>>>
>>> I've had some offline discussions about getting someone on this cloud to
 debug the problem.  Originally we decided not to pursue it since it's not
 hard to work around and we didn't want to disrupt the environment by trying
 to move to later OpenStack code (we're still back on Mitaka), but it was
 pointed out to me this time around that from a downstream perspective we
 have users on older code as well and it may be worth debugging to make sure
 they don't hit similar problems.

 To that end, I've left one compute node un-rebooted for debugging
 purposes.  The downstream discussion is ongoing, but I'll update here if we
 find anything.


> Thanks,
> Joe
>
> On Mon, Dec 18, 2017 at 10:43 AM, Ben Nemec 
> wrote:
>
>> Hi,
>>
>> It's that magical time again.  You know the one, when we reboot rh1
>> to avoid
>> OVS port exhaustion. :-)
>>
>> If all goes well you won't even notice that this is happening, but
>> there is
>> the possibility that a few jobs will fail while the te-broker host is
>> rebooted so I wanted to let everyone know.  If you notice anything
>> else
>> hosted in rh1 is down (tripleo.org, zuul-status, etc.) let me know.
>> I have
>> been known to forget to restart services after the reboot.
>>
>> I'll send a followup when I'm done.
>>
>> -Ben
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.op
>> enstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __
>
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.op
> enstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
 __

 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.op
 enstack.org?subject:unsubscribe
 

Re: [openstack-dev] [ironic] Kernel parameters needed to boot from iscsi

2017-11-01 Thread Derek Higgins
On 30 October 2017 at 15:16, Julia Kreger  wrote:
...
>>> When I tried it I got this
>>> [  370.704896] dracut-initqueue[387]: Warning: iscistart: Could not
>>> get list of targets from firmware.
>>>
>>> perhaps we could alter iscistart to not complain if there are no
>>> targets attached and just continue, then simply always have
>>> rd.iscsi.firmware=1 in the kernel param regardless of storage type
>>
>
> For those that haven't been following IRC discussion, Derek was kind
> enough to submit a pull request to address this in dracut.

The relevant fix is here
https://github.com/dracutdevs/dracut/pull/298

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] Kernel parameters needed to boot from iscsi

2017-10-25 Thread Derek Higgins
On 25 October 2017 at 13:03, Dmitry Tantsur  wrote:
> (ooops, I somehow missed this email. sorry!)
>
> Hi Yolanda,
>
> On 10/16/2017 11:06 AM, Yolanda Robla Mota wrote:
>>
>> Hi
>> Recently i've been helping some customers in the boot from ISCSI feature.
>> So far everything was working, but we had a problem when booting the
>> deployment image.
>> It needed specifically a flag rd.iscsi.ibft=1 rd.iscsi.firmware=1 in the
>> grub commands. But as the generated deployment image doesn't contain these
>> flags, ISCSI was not booting properly. For other hardware setups, different
>> flags may be needed.
>
>
> Note that we only support BFV in the form of booting from a cinder volume
> officially. We haven't looked into iBFV in depth.
>
>> The solution was to manually execute a virt-customize on the deployment
>> image to hardcode these parameters.
>> I wonder if we can add some feature in Ironic to support it. We have
>> discussed about kernel parameters several times. But at this time, it
>> affects ISCSI booting. Not having a way in Ironic to customize these
>> parameters forces to manual workarounds.
>
>
> This has been discussed several times, and every time the idea of making it
> a generic feature was rejected. There is an option to configure kernel
> parameters for PXE boot. However, apparently, you cannot add
> rd.iscsi.firmware=1 if you don't use iSCSI, it will fail to boot (Derek told
> me that, I did not check).
When I tried it I got this
[  370.704896] dracut-initqueue[387]: Warning: iscistart: Could not
get list of targets from firmware.

perhaps we could alter iscistart to not complain if there are no
targets attached and just continue, then simply always have
rd.iscsi.firmware=1 in the kernel param regardless of storage type

> If your deployment only uses iSCSI - you can
> modify [pxe]pxe_append_params in your ironic.conf to include it.

I'm not sure this would help, in the boot from cinder volume case the
iPXE script simply attaches the target and then hands control over to
boot what ever is on the target. The kernel parameters use are already
baked into the grub config. iPXE doesn't alter them and IPA isn't
involved at all.

If anybody is looking to try any of this out in tripleo, here are some
instructions to boot from cinder volume with ironic on a tripleo
overcloud
https://etherpad.openstack.org/p/tripleo-bfv

>
>
>> So can we reconsider the proposal to add kernel parameters there? It could
>> be a settable argument (driver_info/kernel_args), and then the IPA could set
>> the parameters properly on the image. Or any other option is welcome.
>> What are your thoughts there?
>
>
> Well, we could probably do that *for IPA only*. Something like
> driver_info/deploy_image_append_params. This is less controversial than
> doing that for user instances, as we fully control the IPA boot. If you want
> to work on it, let's start with a detailed RFE please.
>
>>
>> Thanks
>>
>> --
>>
>> Yolanda Robla Mota
>>
>> Principal Software Engineer, RHCE
>>
>> Red Hat
>>
>> 
>>
>> C/Avellana 213
>>
>> Urb Portugal
>>
>> yrobl...@redhat.com  M: +34605641639
>> 
>>
>> 
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] scenario006 conflict

2017-08-16 Thread Derek Higgins
On 19 July 2017 at 17:02, Derek Higgins <der...@redhat.com> wrote:
> On 17 July 2017 at 15:56, Derek Higgins <der...@redhat.com> wrote:
>> On 17 July 2017 at 15:37, Emilien Macchi <emil...@redhat.com> wrote:
>>> On Thu, Jul 13, 2017 at 6:01 AM, Emilien Macchi <emil...@redhat.com> wrote:
>>>> On Thu, Jul 13, 2017 at 1:55 AM, Derek Higgins <der...@redhat.com> wrote:
>>>>> On 12 July 2017 at 22:33, Emilien Macchi <emil...@redhat.com> wrote:
>>>>>> On Wed, Jul 12, 2017 at 2:23 PM, Emilien Macchi <emil...@redhat.com> 
>>>>>> wrote:
>>>>>> [...]
>>>>>>> Derek, it seems like you want to deploy Ironic on scenario006
>>>>>>> (https://review.openstack.org/#/c/474802). I was wondering how it
>>>>>>> would work with multinode jobs.
>>>>>>
>>>>>> Derek, I also would like to point out that
>>>>>> https://review.openstack.org/#/c/474802 is missing the environment
>>>>>> file for non-containerized deployments & and also the pingtest file.
>>>>>> Just for the record, if we can have it before the job moves in gate.
>>>>>
>>>>> I knew I had left out the ping test file, this is the next step but I
>>>>> can create a noop one for now if you'd like?
>>>>
>>>> Please create a basic pingtest with common things we have in other 
>>>> scenarios.
>>>>
>>>>> Is the non-containerized deployments a requirement?
>>>>
>>>> Until we stop supporting non-containerized deployments, I would say yes.
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> --
>>>>>> Emilien Macchi
>>>>
>>>> So if you create a libvirt domain, would it be possible to do it on
>>>> scenario004 for example and keep coverage for other services that are
>>>> already on scenario004? It would avoid to consume a scenario just for
>>>> Ironic. If not possible, then talk with Flavio and one of you will
>>>> have to prepare scenario007 or 0008, depending where Numans is in his
>>>> progress to have OVN coverage as well.
>>>
>>> I haven't seen much resolution / answers about it. We still have the
>>> conflict right now and open questions.
>>>
>>> Derek, Flavio - let's solve this one this week if we can.
>> Yes, I'll be looking into using scenario004 this week. I was traveling
>> last week so wasn't looking at it.
>
> I'm not sure if this is what you had intended but I believe to do
> this(i.e. test the nova ironic driver) we we'll
> need to swap out the nova libvirt driver for the ironic one. I think
> this is ok as the libvirt driver has coverage
> in other scenarios.
>
> Because there are no virtual BMC's setup yet on the controller I also
> have to remove the instance creation,
> but if merged I'll next work on adding these now. So I'm think
> something like this
> https://review.openstack.org/#/c/485261/

Quick update here, after talking to Emilien about this, I'll add to
this patch to set up VirtualBMC instances and not remove instance
creation. So it continues to test a ceph backed glance.

>
>>
>>>
>>> Thanks,
>>> --
>>> Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] scenario006 conflict

2017-07-19 Thread Derek Higgins
On 17 July 2017 at 15:56, Derek Higgins <der...@redhat.com> wrote:
> On 17 July 2017 at 15:37, Emilien Macchi <emil...@redhat.com> wrote:
>> On Thu, Jul 13, 2017 at 6:01 AM, Emilien Macchi <emil...@redhat.com> wrote:
>>> On Thu, Jul 13, 2017 at 1:55 AM, Derek Higgins <der...@redhat.com> wrote:
>>>> On 12 July 2017 at 22:33, Emilien Macchi <emil...@redhat.com> wrote:
>>>>> On Wed, Jul 12, 2017 at 2:23 PM, Emilien Macchi <emil...@redhat.com> 
>>>>> wrote:
>>>>> [...]
>>>>>> Derek, it seems like you want to deploy Ironic on scenario006
>>>>>> (https://review.openstack.org/#/c/474802). I was wondering how it
>>>>>> would work with multinode jobs.
>>>>>
>>>>> Derek, I also would like to point out that
>>>>> https://review.openstack.org/#/c/474802 is missing the environment
>>>>> file for non-containerized deployments & and also the pingtest file.
>>>>> Just for the record, if we can have it before the job moves in gate.
>>>>
>>>> I knew I had left out the ping test file, this is the next step but I
>>>> can create a noop one for now if you'd like?
>>>
>>> Please create a basic pingtest with common things we have in other 
>>> scenarios.
>>>
>>>> Is the non-containerized deployments a requirement?
>>>
>>> Until we stop supporting non-containerized deployments, I would say yes.
>>>
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> Emilien Macchi
>>>
>>> So if you create a libvirt domain, would it be possible to do it on
>>> scenario004 for example and keep coverage for other services that are
>>> already on scenario004? It would avoid to consume a scenario just for
>>> Ironic. If not possible, then talk with Flavio and one of you will
>>> have to prepare scenario007 or 0008, depending where Numans is in his
>>> progress to have OVN coverage as well.
>>
>> I haven't seen much resolution / answers about it. We still have the
>> conflict right now and open questions.
>>
>> Derek, Flavio - let's solve this one this week if we can.
> Yes, I'll be looking into using scenario004 this week. I was traveling
> last week so wasn't looking at it.

I'm not sure if this is what you had intended but I believe to do
this(i.e. test the nova ironic driver) we we'll
need to swap out the nova libvirt driver for the ironic one. I think
this is ok as the libvirt driver has coverage
in other scenarios.

Because there are no virtual BMC's setup yet on the controller I also
have to remove the instance creation,
but if merged I'll next work on adding these now. So I'm think
something like this
https://review.openstack.org/#/c/485261/

>
>>
>> Thanks,
>> --
>> Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] scenario006 conflict

2017-07-17 Thread Derek Higgins
On 17 July 2017 at 15:37, Emilien Macchi <emil...@redhat.com> wrote:
> On Thu, Jul 13, 2017 at 6:01 AM, Emilien Macchi <emil...@redhat.com> wrote:
>> On Thu, Jul 13, 2017 at 1:55 AM, Derek Higgins <der...@redhat.com> wrote:
>>> On 12 July 2017 at 22:33, Emilien Macchi <emil...@redhat.com> wrote:
>>>> On Wed, Jul 12, 2017 at 2:23 PM, Emilien Macchi <emil...@redhat.com> wrote:
>>>> [...]
>>>>> Derek, it seems like you want to deploy Ironic on scenario006
>>>>> (https://review.openstack.org/#/c/474802). I was wondering how it
>>>>> would work with multinode jobs.
>>>>
>>>> Derek, I also would like to point out that
>>>> https://review.openstack.org/#/c/474802 is missing the environment
>>>> file for non-containerized deployments & and also the pingtest file.
>>>> Just for the record, if we can have it before the job moves in gate.
>>>
>>> I knew I had left out the ping test file, this is the next step but I
>>> can create a noop one for now if you'd like?
>>
>> Please create a basic pingtest with common things we have in other scenarios.
>>
>>> Is the non-containerized deployments a requirement?
>>
>> Until we stop supporting non-containerized deployments, I would say yes.
>>
>>>>
>>>> Thanks,
>>>> --
>>>> Emilien Macchi
>>
>> So if you create a libvirt domain, would it be possible to do it on
>> scenario004 for example and keep coverage for other services that are
>> already on scenario004? It would avoid to consume a scenario just for
>> Ironic. If not possible, then talk with Flavio and one of you will
>> have to prepare scenario007 or 0008, depending where Numans is in his
>> progress to have OVN coverage as well.
>
> I haven't seen much resolution / answers about it. We still have the
> conflict right now and open questions.
>
> Derek, Flavio - let's solve this one this week if we can.
Yes, I'll be looking into using scenario004 this week. I was traveling
last week so wasn't looking at it.

>
> Thanks,
> --
> Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] scenario006 conflict

2017-07-13 Thread Derek Higgins
On 12 July 2017 at 22:33, Emilien Macchi  wrote:
> On Wed, Jul 12, 2017 at 2:23 PM, Emilien Macchi  wrote:
> [...]
>> Derek, it seems like you want to deploy Ironic on scenario006
>> (https://review.openstack.org/#/c/474802). I was wondering how it
>> would work with multinode jobs.
>
> Derek, I also would like to point out that
> https://review.openstack.org/#/c/474802 is missing the environment
> file for non-containerized deployments & and also the pingtest file.
> Just for the record, if we can have it before the job moves in gate.

I knew I had left out the ping test file, this is the next step but I
can create a noop one for now if you'd like?

Is the non-containerized deployments a requirement?

>
> Thanks,
> --
> Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] scenario006 conflict

2017-07-13 Thread Derek Higgins
On 12 July 2017 at 22:23, Emilien Macchi  wrote:
> Hey folks,
>
> Derek, it seems like you want to deploy Ironic on scenario006
> (https://review.openstack.org/#/c/474802). I was wondering how it
> would work with multinode jobs.

The idea was that we would create a libvirt domain on the overcloud
controller that Ironic could then control with VirtualBMC. But for the
moment the job only installs Ironic and I was going to build onto it
from there.

> Also, Flavio would like to test k8s on scenario006:
> https://review.openstack.org/#/c/471759/ . To avoid having too much
> scenarios and complexity, I think if ironic tests can be done on a
> 2nodes job, then we can deploy ironic on scenario004 maybe. If not,
> then please give the requirements so we can see how to structure it.

I'll take look and see whats possible

>
> For Flavio's need, I think we need a dedicated scenario for now, since
> he's not going to deploy any OpenStack service on the overcloud for
> now, just k8s.
>
> Thanks for letting us know the plans, so we can keep the scenarios in
> good shape.
> Note: Numans also wants to test OVN and I suggested to create
> scenario007 (since we can't deploy OVN before Pike, so upgrades
> wouldn't work).
> Note2: it seems like efforts done to test complex HA architectures
> weren't finished in scenario005 - Michele: any thoughts on this one?
> should we remove it now or do we expect it working one day?
>
>
> Thanks,
> --
> Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] rh1 issues post-mortem

2017-03-24 Thread Derek Higgins
On 22 March 2017 at 22:36, Ben Nemec  wrote:
> Hi all (owl?),
>
> You may have missed it in all the ci excitement the past couple of days, but
> we had a partial outage of rh1 last night.  It turns out the OVS port issue
> Derek discussed in
> http://lists.openstack.org/pipermail/openstack-dev/2016-December/109182.html
> reared its ugly head on a few of our compute nodes, which caused them to be
> unable to spawn new instances.  They kept getting scheduled since it looked
> like they were underutilized, which caused most of our testenvs to fail.
>
> I've rebooted the affected nodes, as well as a few more that looked like
> they might run into the same problem in the near future.  Everything looks
> to be working well again since sometime this morning (when I disabled the
> broken compute nodes), but there aren't many jobs passing due to the
> plethora of other issues we're hitting in ci.  There have been some stable
> job passes though so I believe things are working again.
>
> As far as preventing this in the future, the right thing to do would
> probably be to move to a later release of OpenStack (either point or major)
> where hopefully this problem would be fixed.  However, I'm hesitant to do
> that for a few reasons.  First is "the devil you know". Outside of this
> issue, we've gotten rh1 pretty rock solid lately.  It's been overworked, but
> has been cranking away for months with no major cloud-related outages.
> Second is that an upgrade would be a major process, probably involving some
> amount of downtime.  Since the long-term plan is to move everything to RDO
> cloud I'm not sure that's the best use of our time at this point.

+1 on keeping the status quo until moving to rdo-cloud.

>
> Instead, my plan for the near term is to keep a closer eye on the error
> notifications from the services.  We previously haven't had anything
> consuming those, but I've dropped a little tool on the controller that will
> dump out error notifications so we can watch for signs of this happening
> again.  I suspect the signs were there long before the actual breakage
> happened, but nobody was looking for them.  Now I will be.
>
> So that's where things stand with rh1.  Any comments or concerns welcome.
>
> Thanks.
>
> -Ben
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] DB cleanup cron jobs added to rh1

2016-12-19 Thread Derek Higgins
On 16 December 2016 at 21:40, Ben Nemec  wrote:
> Just a heads up for everyone, I've added some DB cleanup jobs to rh1 which
> will hopefully prevent the performance degradations over time that we've
> been seeing in that environment.  Specifically, the crontab now looks like
> this:
>
> # Clean up heat db
> 0 5 * * * heat-manage purge_deleted 7
> # Archive nova db entries 5 times so we get everything
> 0,10,20,30,40 6 * * * nova-manage db archive_deleted_rows --max_rows 10
>

lgtm, We did a number of things last week in order to deal with the
performance problems, these db archive commands cover part of it and
needed to be added.

The other thing we need to keep an eye on is the number of OVS ports
on each compute node, on some compute nodes we had over 2000 ovs ports
and ovs couldn't be restarted(and some other services). Ultimately to
deal with this we rebooted each compute node and allowed
neutron-ovs-cleanup to delete the unused ports. This wasn't ideal
because
1) neutron-ovs-cleanup in some cases took a long time to delete the
ports, and delayed nova-compute from starting up, so compute nodes
couldn't be used for a extended period of time.
2) This also caused a reboot of the infrastructure nodes we run on rh1
(e.g. proxy, te-broker etc...) some of these didn't come back as
expected, where needed patches have been submitted[1][2]

I haven't done anything to prevent the ports building up again, so
this is probably still a ongoing issue we need to look into

[1] - https://review.openstack.org/#/c/409930/
[2] - https://review.openstack.org/#/c/406927/

> I picked 5 and 6 AM UTC because I think that's before most people in the EU
> are starting and well after the US is done so the cloud should be pretty
> quiet at that time.
>
> I think it's worth noting that we should probably be setting up this sort of
> thing on initial deployment by default.  Maybe we are now (rh1 is still back
> on Mitaka), but if not we should figure out some appropriate defaults.
>
> -Ben
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Setting up to 3rd party CI OVB jobs

2016-10-11 Thread Derek Higgins
On 7 October 2016 at 14:03, Paul Belanger  wrote:
> Greetings,
>
> I wanted to propose a work item, that I am happy to spearhead, about setting 
> up
> a 3rd party CI system for tripleo project. The work I am proposing, wouldn't
> actually affect anything today about tripleo-ci but provider a working example
> of how 3rd party CI will work and potential migration path.

Great, if we are to transition to 3rd party CI this getting a trial up
and running first would be great to minimize down time if we are to
move jobs in future

>
> This is just one example of how it would work, obviously everything is open 
> for
> discussions but I think you'll find the plan to be workable. Additionally, 
> this
> topic would only apply to OVB jobs, existing jobs already running on cloud
> providers from openstack-infra would not be affected.
>
> What I am proposing is we move tripleo-test-cloud-rh2 (currently disabled) 
> from
> openstack-infra (nodepool) to rdoproject (nodepool).  This give us a cloud we
> can use for OVB; we know it works because OVB jobs have run on it before.

+1, there are some user currently on RH2 using it as a dev
environment, but if we start small this wont be a problem and those
users should eventually be moving too a different cloud

>
> There is a few issues we'd first need to work on, specifically since
> rdoproject.org is currently using SoftwareFactory[1] we'd need to have them
> adding support for nodepool-builder. This is needed so we can use the existing
> DIB elements that openstack-infra does to create centos-7 images (which 
> tripleo
> uses today). We have 2 options, wait for SF team to add support for this (I
> don't know how long that is, but they know of the request) or we manually 
> setup
> a external nodepool-builder instance for rdoproject.org, which connects to
> nodepool.rdoproject.org via gearman (I suggest we do this).

As a 3rd option, is it possible to just use the centos cloud image
directly? The majority of the data cached on the DIB built image isn't
actually used by tripleo-ci?

>
> Once that issue is solved, things are a little easier.  It would just be a
> matter of porting upstream CI configuration to rdoproject.org and validating
> images, JJB jobs and test validation. Cloud credentials removed from
> openstack-infra and added to rdoproject.org.
>
> I'd basically need help from rdoproject (eg: dmsimard) with some of the admin
> tasks, a long with a VM for nodepool-builder. We already have the 3rdparty CI
> bits setup in rdoproject.org, we are actually running DLRN builds on
> python-tripleoclient / python-openstackclient upstream patches.

Sounds good(assuming the RDO community are ok with allowing us to add
jobs over there)

>
> I think the biggest step is getting nodepool-builder working with Software
> Factory, but once that is done, it should be straightforward work.
>
> Now, if SoftwareFactory is the long term home for this system is open for
> debate.  Obviously, rdoproject has the majority of this infrastructure in 
> plan,
> so it makes for a good place to run tripleo-ci OVB jobs.  Other wise, if there
> are issue, then tripleo would have to stand up their own jenkins/nodepool/zuul
> infrastructure and maintain it.
>
> I'm happy to answer questions,
> Paul
>
> [1] http://softwarefactory-project.io/
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo-test-cloud-rh1 and bastion host

2016-09-13 Thread Derek Higgins
On 9 September 2016 at 16:38, Paul Belanger  wrote:
> Greetings,
>
> I would like to start the discussions around the removal of the bastion host
> that sits in front of tripleo-test-cloud-rh1.  It is my understanding, all
> traffic from tripleo-test-cloud-rh1 flows through this linux box.  Obviously
> this is problematic for a public cloud.
>
> I currently do not know the history of the bastion host, I am hoping this 
> thread
> will start discussions around it.
>
> However, my personal preference is to remove the bastion from the pipeline
> between internet and tripleo-test-cloud-rh1. My main objection to the host, is
> the fact we do packet filtering of traffic flowing between the internet and
> tripleo-test-cloud-rh1.

Would it be enough to simply remove the traffic filtering? or are
there other problems you are hoping to get rid of?

>
> Ideally tripleo-test-cloud-rh1 will simply have an unfiltered network drop on
> the public web, this is how we do it today with the infracloud in
> #openstack-infra.
>
> This will avoid the need to gain access to a private server (bastion) and need
> to manipulate networking traffic.
>
> I'd like for us to try and establish a time frame to make this happen too.

I don't know how much work this would be and what problems we would
hit, historically the upstream tripleo team have been hands off when
it comes to this box(and the rack switch), from our point of view we
use it as a jump host to get to the other hosts on which openstack
runs. And all outside traffic goes through it, I suppose the
alternative would be to route the traffic directly to the overcloud
controller.

We should be moving all our cloud usage onto RDO-Cloud some day, we
should probably try and first get a timeline for when we are moving
onto RDO-Cloud, if that is coming up soon perhaps we can just wait at
this situation goes away.

>
> ---
> Paul
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][CI] Need more undercloud resources

2016-08-25 Thread Derek Higgins
On 25 August 2016 at 02:56, Paul Belanger  wrote:
> On Wed, Aug 24, 2016 at 02:11:32PM -0400, James Slagle wrote:
>> The latest recurring problem that is failing a lot of the nonha ssl
>> jobs in tripleo-ci is:
>>
>> https://bugs.launchpad.net/tripleo/+bug/1616144
>> tripleo-ci: nonha jobs failing with Unable to establish connection to
>> https://192.0.2.2:13004/v1/a90407df1e7f4f80a38a1b1671ced2ff/stacks/overcloud/f9f6f712-8e89-4ea9-a34b-6084dc74b5c1
>>
>> This error happens while polling for events from the overcloud stack
>> by tripleoclient.
>>
>> I can reproduce this error very easily locally by deploying with an
>> ssl undercloud with 6GB ram and 2 vcpus. If I don't enable swap,
>> something gets OOM killed. If I do enable swap, swap gets used (< 1GB)
>> and then I hit this error almost every time.
>>
>> The stack keeps deploying but the client has died, so the job fails.
>> My investigation so far has only pointed out that it's the swap
>> allocation that is delaying things enough to cause the failure.
>>
>> We do not see this error in the ha job even though it deploys more
>> nodes. As of now, my only suspect is that it's the overhead of the
>> initial SSL connections causing the error.
>>
>> If I test with 6GB ram and 4 vcpus I can't reproduce the error,
>> although much more swap is used due to the increased number of default
>> workers for each API service.
>>
>> However, I suggest we just raise the undercloud specs in our jobs to
>> 8GB ram and 4 vcpus. These seem reasonable to me because those are the
>> default specs used by infra in all of their devstack single and
>> multinode jobs spawned on all their other cloud providers. Our own
>> multinode job for the undercloud/overcloud and undercloud only job are
>> running on instances of these sizes.
>>
> Close, our current flavors are 8vCPU, 8GB RAM, 80GB HDD. I'd recommend doing
> that for the undercloud just to be consistent.

The HD on most of the compute nodes are 200GB so we've been trying
really hard[1] to keep the disk usage for each instance down so that
we can fit as many instances onto each compute nodes as possible
without being restricted by the HD's. We've also allowed nova to
overcommit on storage by a factor of 3. The assumption is that all of
the instances are short lived and a most of them never fully exhaust
the storage allocated to them. Even the ones that do (the undercloud
being the one that does) hit peak at different times so everything is
tickety boo.

I'd strongly encourage against using a flavor with a 80GB HDD, if we
increase the disk space available to the undercloud to 80GB then we
will eventually be using it in CI. And 3 undercloud on the same
compute node will end up filling up the disk on that host.

[1] 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/toci_gate_test.sh#n26

>
> [1] http://docs.openstack.org/infra/system-config/contribute-cloud.html
>
>> Yes, this is just sidestepping the problem by throwing more resources
>> at it. The reality is that we do not prioritize working on optimizing
>> for speed/performance/resources. We prioritize feature work that
>> indirectly (or maybe it's directly?) makes everything slower,
>> especially at this point in the development cycle.
>>
>> We should therefore expect to have to continue to provide more and
>> more resources to our CI jobs until we prioritize optimizing them to
>> run with less.
>>
> I actually believe these problem highlights how large tripleo-ci has grown, 
> and
> in need of a refactor. While we won't solve this problem today, I do think
> tripleo-ci is to monolithic today. I believe there is some discussion on
> breaking jobs into different scenarios, but I haven't had a chance to read up 
> on
> that.
>
> I'm hoping in Barcelona we can have a topic on CI pipelines and how better to
> optimize our runs.
>
>> Let me know if there is any disagreement on making these changes. If
>> there isn't, I'll apply them in the next day or so. If there are any
>> other ideas on how to address this particular bug for some immediate
>> short term relief, please let me know.
>>
>> --
>> -- James Slagle
>> --
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [TripleO][CI] Need more undercloud resources

2016-08-25 Thread Derek Higgins
On 24 August 2016 at 19:11, James Slagle  wrote:
> The latest recurring problem that is failing a lot of the nonha ssl
> jobs in tripleo-ci is:
>
> https://bugs.launchpad.net/tripleo/+bug/1616144
> tripleo-ci: nonha jobs failing with Unable to establish connection to
> https://192.0.2.2:13004/v1/a90407df1e7f4f80a38a1b1671ced2ff/stacks/overcloud/f9f6f712-8e89-4ea9-a34b-6084dc74b5c1
>
> This error happens while polling for events from the overcloud stack
> by tripleoclient.
>
> I can reproduce this error very easily locally by deploying with an
> ssl undercloud with 6GB ram and 2 vcpus. If I don't enable swap,
> something gets OOM killed. If I do enable swap, swap gets used (< 1GB)
> and then I hit this error almost every time.
>
> The stack keeps deploying but the client has died, so the job fails.
> My investigation so far has only pointed out that it's the swap
> allocation that is delaying things enough to cause the failure.
>
> We do not see this error in the ha job even though it deploys more
> nodes. As of now, my only suspect is that it's the overhead of the
> initial SSL connections causing the error.
>
> If I test with 6GB ram and 4 vcpus I can't reproduce the error,
> although much more swap is used due to the increased number of default
> workers for each API service.
>
> However, I suggest we just raise the undercloud specs in our jobs to
> 8GB ram and 4 vcpus. These seem reasonable to me because those are the
> default specs used by infra in all of their devstack single and
> multinode jobs spawned on all their other cloud providers. Our own
> multinode job for the undercloud/overcloud and undercloud only job are
> running on instances of these sizes.
>
> Yes, this is just sidestepping the problem by throwing more resources
> at it. The reality is that we do not prioritize working on optimizing
> for speed/performance/resources. We prioritize feature work that
> indirectly (or maybe it's directly?) makes everything slower,
> especially at this point in the development cycle.

Yup, I couldn't agree with this more it is exactly what happens. And
as long as everybody remains driven by particular features its going
to be the case. Ideally we'd have somebody who's driving force is
simply to take what we have at any particular point in time profile
certain pain points and make improvements where they can be made tune
things etc

>
> We should therefore expect to have to continue to provide more and
> more resources to our CI jobs until we prioritize optimizing them to
> run with less.
>
> Let me know if there is any disagreement on making these changes. If
> there isn't, I'll apply them in the next day or so. If there are any
> other ideas on how to address this particular bug for some immediate
> short term relief, please let me know.

Not disagreeing but just a reminder to double check quota's and
over-commit ratios (for vCPU)  so things will still fit where the
should be.

Also its worth noting that act of increasing the number of vCPU's
available to the undercloud will not only increase the memory
requirements of the undercloud (we know this happens) but the extra
services even if unused may cause additional cpu usage on the host so
this is worth monitoring.

>
> --
> -- James Slagle
> --
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][CI] Memory shortage in HA jobs, please increase it

2016-08-19 Thread Derek Higgins
On 19 August 2016 at 11:08, Giulio Fidente <gfide...@redhat.com> wrote:
> On 08/19/2016 11:41 AM, Derek Higgins wrote:
>>
>> On 19 August 2016 at 00:07, Sagi Shnaidman <sshna...@redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> we have a problem again with not enough memory in HA jobs, all of them
>>> constantly fails in CI: http://status-tripleoci.rhcloud.com/
>>
>>
>> Have we any idea why we need more memory all of a sudden? For months
>> the overcloud nodes have had 5G of RAM, then last week[1] we bumped it
>> too 5.5G now we need it bumped too 6G.
>>
>> If a new service has been added that is needed on the overcloud then
>> bumping to 6G is expected and probably the correct answer but I'd like
>> to see us avoiding blindly increasing the resources each time we see
>> out of memory errors without investigating if there was a regression
>> causing something to start hogging memory.
>
>
> fwiw, one recent addition was the cinder-backup service
>
> though this service wasn't enabled by default in mitaka so with [1] we can
> disable the service by default for newton as well

we still got memory errors with this patch, I'm going to bump up to 6G
as sagi suggested to temporarily unblock things but I strongly suggest
somebody looks into this and we prioritized undoing the bump if
possible next week.

>
> 1. https://review.openstack.org/#/c/357729
>
> --
> Giulio Fidente
> GPG KEY: 08D733BA | IRC: gfidente

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][CI] Memory shortage in HA jobs, please increase it

2016-08-19 Thread Derek Higgins
On 19 August 2016 at 00:07, Sagi Shnaidman  wrote:
> Hi,
>
> we have a problem again with not enough memory in HA jobs, all of them
> constantly fails in CI: http://status-tripleoci.rhcloud.com/

Have we any idea why we need more memory all of a sudden? For months
the overcloud nodes have had 5G of RAM, then last week[1] we bumped it
too 5.5G now we need it bumped too 6G.

If a new service has been added that is needed on the overcloud then
bumping to 6G is expected and probably the correct answer but I'd like
to see us avoiding blindly increasing the resources each time we see
out of memory errors without investigating if there was a regression
causing something to start hogging memory.

Sorry if it seems like I'm being picky about this (I seem to resist
these bumps every time they come up) but there are two good reasons to
avoid this if possible
o at peak we are currently configured to run 75 simultaneous jobs
(although we probably don't reach that at the moment), and each HA job
has 5 baremetal nodes so bumping from 5G too 6G increases the amount
of RAM ci can use at peak by 375G
o When we bump the RAM usage of baremetal nodes from 5G too 6G what
we're actually doing is increasing the minimum requirements for
developers from 28G(or whatever the number is now) too 32G

So before we bump the number can we just check first if its justified,
as I've watched this number increase from 2G since we started running
tripleo-ci

thanks,
Derek.

[1] - https://review.openstack.org/#/c/353655/

> I've created a patch that will increase it[1], but we need to increase it
> right now on rh1.
> I can't do it now, because unfortunately I'll not be able to watch this if
> it works and no problems appear.
> TripleO CI cloud admins, please increase the memory for baremetal flavor on
> rh1 tomorrow (to 6144?).
>
> Thanks
>
> [1] https://review.openstack.org/#/c/357532/
> --
> Best regards
> Sagi Shnaidman

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] tripleo-test-cloud-rh2 local mirror server

2016-07-29 Thread Derek Higgins
On 27 July 2016 at 17:52, Paul Belanger <pabelan...@redhat.com> wrote:
> On Wed, Jul 27, 2016 at 02:54:00PM +0100, Derek Higgins wrote:
>> On 21 July 2016 at 23:04, Paul Belanger <pabelan...@redhat.com> wrote:
>> > Greetings,
>> >
>> > I write today to see how I can remove this server from 
>> > tripleo-test-cloud-rh2. I
>> > have an open patch[1] currently to migrate tripleo-ci to use our AFS 
>> > mirrors for
>> > centos and epel.  However, I'm still struggling to see what else you are 
>> > using
>> > the local mirror for.
>> >
>> > From what I see, there appears to be some puppet modules in the mirror?
>> >
>> > The reason I am doing this work, is to help bring tripleo inline with
>> > openstack-infra tooling.  There shouldn't be the need for a project to 
>> > maintain
>> > its own infrastructure outside of openstack-infra.  If so, I see that as 
>> > some
>> > sort of a failure between the project and openstack-infra.   And with that 
>> > in
>> > mind, I am here to help fix that.
>> >
>> > For the most part, I think we have everything currently in place to 
>> > migrate away
>> > from your locally mirror. I just need some help figuring what else is left 
>> > and
>> > then delete it.
>>
>> Hi Paul,
>> The mirror server hosts 3 sets of data used in CI long with a cron
>> a job aimed at promoting trunk repositories,
>> The first you've already mentioned, there is a list of puppet modules
>> hosted here, we soon hope to move to packaged puppet modules so the
>> need for this will go away.
>>
> Ya, I was looking at an open review to rework this. If we moved these puppet
> modules to tarballs over git repos, I think we could mirror them pretty easy
> into our AFS mirrors.  Them being git repos requires more work because some
> policies around git repos.

We wont need to do anything here, the patch to move away from git
repos will be instead using the rdo packaged puppet modules, so we
wont need anything from infra for this, we just end up using the rdo
repository like we do for all other openstack projects.

>
>> The second is a mirror of the centos cloud images, these are updated
>> hourly by the centos-cloud-images cronjob[1], I guess these could be
>> easily replaced with the AFS server
>>
> So 2 things here.
>
> 1) I've reached out to CentOS asking to enable rsync support on
> http://cloud.centos.org/ if they do that, I can easily enable rsync for it.

Great

>
> 2) What about moving away from the centos diskimage-builder element and switch
> to centos-minimal element. I have an open review for this, but need help on
> actually testing this.  It moves away from using the cloud image, and instead
> uses yumdownloader to prebuild the images.

Its possible, but I think out of scope for a general ci thread, its
more of a tripleo decision so maybe needs its own thread to get a
wider audience.

>
>> Then we come to the parts where it will probably be more tricky to
>> move away from our own server
>>
>> o cached images - our nightly periodic jobs run tripleo ci with
>> master/HEAD for all openstack projects (using the most recent rdo
>> trunk repository), if the jobs pass then we upload the overcloud-full
>> and ipa images to the mirror server along with logging what jobs
>> passed, this happens at the end of toci_instack.sh[2], nothing else
>> happens at this point the files are just uploaded nothing starts using
>> them yet.
>>
> I suggest we move this to tarballs.o.o for now, this is what other projects 
> are
> doing.  I believe we are also considering moving this process into AFS too.

Ok, its an option worth looking at if we could make it work.

>
>> o promote script - hourly we then run the promote script[3], this
>> script is whats responsible for the promotion of the master rdo
>> repository that is used by tripleo ci (and devs), it checks to see if
>> images have been updated to the mirror server by the periodic jobs,
>> and if all of the jobs we care about (currently
>> periodic-tripleo-ci-centos-7-ovb-ha
>> periodic-tripleo-ci-centos-7-ovb-nonha[4]) passed then it does 2
>> things
>>   1. updates the current-tripleo link on the mirror server[5]
>>   2. updates the current-tripleo link on the rdo trunk server[6]
>> By doing this we ensure that the the current-tripleo link on the rdo
>> trunk server is always pointing to something that has passed tripleo
>> ci jobs, and that tripleo ci is using cached images that were built
>> using this repository
>>
&g

Re: [openstack-dev] [tripleo] tripleo-test-cloud-rh2 local mirror server

2016-07-27 Thread Derek Higgins
On 21 July 2016 at 23:04, Paul Belanger  wrote:
> Greetings,
>
> I write today to see how I can remove this server from 
> tripleo-test-cloud-rh2. I
> have an open patch[1] currently to migrate tripleo-ci to use our AFS mirrors 
> for
> centos and epel.  However, I'm still struggling to see what else you are using
> the local mirror for.
>
> From what I see, there appears to be some puppet modules in the mirror?
>
> The reason I am doing this work, is to help bring tripleo inline with
> openstack-infra tooling.  There shouldn't be the need for a project to 
> maintain
> its own infrastructure outside of openstack-infra.  If so, I see that as some
> sort of a failure between the project and openstack-infra.   And with that in
> mind, I am here to help fix that.
>
> For the most part, I think we have everything currently in place to migrate 
> away
> from your locally mirror. I just need some help figuring what else is left and
> then delete it.

Hi Paul,
The mirror server hosts 3 sets of data used in CI long with a cron
a job aimed at promoting trunk repositories,
The first you've already mentioned, there is a list of puppet modules
hosted here, we soon hope to move to packaged puppet modules so the
need for this will go away.

The second is a mirror of the centos cloud images, these are updated
hourly by the centos-cloud-images cronjob[1], I guess these could be
easily replaced with the AFS server

Then we come to the parts where it will probably be more tricky to
move away from our own server

o cached images - our nightly periodic jobs run tripleo ci with
master/HEAD for all openstack projects (using the most recent rdo
trunk repository), if the jobs pass then we upload the overcloud-full
and ipa images to the mirror server along with logging what jobs
passed, this happens at the end of toci_instack.sh[2], nothing else
happens at this point the files are just uploaded nothing starts using
them yet.

o promote script - hourly we then run the promote script[3], this
script is whats responsible for the promotion of the master rdo
repository that is used by tripleo ci (and devs), it checks to see if
images have been updated to the mirror server by the periodic jobs,
and if all of the jobs we care about (currently
periodic-tripleo-ci-centos-7-ovb-ha
periodic-tripleo-ci-centos-7-ovb-nonha[4]) passed then it does 2
things
  1. updates the current-tripleo link on the mirror server[5]
  2. updates the current-tripleo link on the rdo trunk server[6]
By doing this we ensure that the the current-tripleo link on the rdo
trunk server is always pointing to something that has passed tripleo
ci jobs, and that tripleo ci is using cached images that were built
using this repository

We've had to run this promote script on the mirror server as the
individual jobs run independently and in oder to make the promote
decision we needed somewhere that is aware of the status of all the
jobs

Hope this answers your questions,
Derek.

[1] - 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/mirror-server/mirror-server.pp#n40
[2] - 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/toci_instack.sh#n198
[3] - 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/mirror-server/promote.sh
[4] - 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/mirror-server/mirror-server.pp#n51
[5] - http://8.43.87.241/builds/current-tripleo/
[6] - 
http://buildlogs.centos.org/centos/7/cloud/x86_64/rdo-trunk-master-tripleo/

>
> [1] https://review.openstack.org/#/c/326143/
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Delorean fail blocks CI for stable branches

2016-07-21 Thread Derek Higgins
Trying to catch up here and summarizing as a top post

so as I see it we have a number of problems that all need sorting out

Regardless of the branch being built, tripleo-ci is currently building
it with the packaging for master
 - in most cases this has worked out ok because most projects don't
have separate packaging for stable branches but it is now biting us
because t-h-t stable can no longer build with the packaging for master
 - This situation started about a month ago, when a commit[1] moved
the code that sets the branch to use in repositories to only happen if
--local isn't present
   - whether this is the correct behavior or not depends on how you
read the description of --local "Use local git repos if possible.", my
original intention for --local was that delorean wouldn't do a new
fetch of the git repositories being used (i.e. only use whats local if
possible), --dev is different it is what intended to be used if you
want the various git repositories to remain as is without being reset.
If the consensus with everybody is the same behavior for --local then
the old behavior is correct and the delorean patch that changed the
behavior of --local was a regression. But before labeling it as one we
need to discuss people expectations of --local as the description
could be more detailed.

Having said all that, in tripleo-ci iirc we need a combination of
--dev and --local and not having an exact match for what we need in
delorean, we've instead relied on knowledge of delorean internals to
put a hack in place so that what we need to happen, happens. It's not
surprising we've gotten broken.

--local alone wont work for us because it has always (and I think
still should) reset local repositories
--dev alone wont work for us because it doesn't update the delorean
database and as a result in cases where we're building multiple
packages during the same CI job we wont end up with a single repo at
the end containing all the packages

so we used --local and worked around the git reset by presetting all
the branch names that our source repository could be set to to match
what zuul told us to use[2]

this is brittle, difficult to follow and something I did a long time
ago expecting us to improve but then never did anything about

So Here is what I propose,

we now have 2 patches that essentially do the same thing[3][4], merge
either of these, based only on which was proposed first i'd go for
345070

Start a discussion in rdo about how exactly people expect --local and
--dev to behave and if there is need for a 3rd option to cover the
tripleo use case, If we find any changes are needed to delorean, make
them and then remove the hacks from tripleo-ci so we're less likely to
hit problems in future.

thanks,
Derek.

[1] https://review.rdoproject.org/r/#/c/1500/
[2] 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/scripts/tripleo.sh#n359
[3] https://review.openstack.org/#/c/345106/1
[4] https://review.openstack.org/#/c/345070/1

On 21 July 2016 at 07:08, Sagi Shnaidman  wrote:
>
>
> On Thu, Jul 21, 2016 at 3:11 AM, Alan Pevec  wrote:
>>
>> On Wed, Jul 20, 2016 at 7:49 PM, Sagi Shnaidman 
>> wrote:
>> > How then it worked before? Can you show me the patch that broke this
>> > functionality in delorean? It should be about 15 Jul when jobs started
>> > to
>> > fail.
>>
>> commented in lp
>>
>> > How then master branch works? It also runs on patched repo and succeeds.
>>
>> I explained that but looks like we're talking past each other.
>>
>> > I don't think we can use this workaround, each time this source file
>> > will
>> > change - all our jobs will fail again? It's not even a workaround.
>> > Please let's stop discussing and let's solve it finally, it blocks our
>> > CI
>> > for stable patches.
>>
>> Sure, I've assigned https://bugs.launchpad.net/tripleo/+bug/1604039 to
>> myself and proposed a patch.
>>
>
> It's a workaround for short time range, but NOT a solution, if you change
> something in this one file, it'll be broken again. But it does NOT solve the
> main issue - after recent changes in dlrn and specs we can't build repo with
> delorean on stable branches.
> I think it should be solved on DLRN side and should be provided a
> appropriate interface to use it for CI purposes.
> I opened an issue there:
> https://github.com/openstack-packages/DLRN/issues/22
> But you closed it, so I suppose we will not get any solution and help for it
> from your side?
>
> Should we move to other packaging tool?
>
>>
>> Alan
>
>
>
>
> --
> Best regards
> Sagi Shnaidman
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development 

[openstack-dev] [tripleo][infra] RH2 is up and running

2016-07-01 Thread Derek Higgins
Hi All,
Yesterday the final patch merged to run CI jobs on RH2, and last
night we merged the patch to tripleo-ci to support RH2 jobs. So we now
have a new job (gate-tripleo-ci-centos-7-ovb-ha) running on all
tripleo patch reviews. This job is running pacemaker HA with a 3 node
controller and a single compute node. Its basically the same as our
current HA job without net-iso.

Looking at pass rates this morning
1. The jobs are failing on stable branches[1]
  o I've submitted a patch to the mitaka and liberty branch to fix
this(see the bug)
2. The pass rate does seem to be a little lower then the RH1 HA job
  o I'll look into this today but overall the pass rate should be good
enough for when RH1 is taken offline

The main difference between jobs running on rh2 when compared to rh1
is that the CI slave IS the undercloud (we've eliminated the need for
an extra undercloud node), this saves resources. We no longer build a
instack qcow2 image, this saves us a little time.

To make this work, early in the CI process we make a call out to a
geard broker and pass it the instance ID of the undercloud, this
broker creates a heat stack (using OVB heat templates) with a number
nodes on a provisioning network. It then attaches an interface on this
provisioning network to the undercloud[2]. Ironic can then talk (with
ipmi) to a bmc node to power them on and PXE boot them. At the end of
the job the stack is deleted.

Whats next?
o On Tuesday evening next, rh1 will be taken offline so I'll be
submitting a patch to remove all of the RH1 jobs and until we bring it
back up we will only have a single triple-ci job
o The RH1 rack will be back available to us on Thursday, we then have a choice
 1. Bring rh1 back up as is and return everything back to the status quo
 2. Redeploy rh1 with OVB and move away from the legacy system permanently
 If the OVB based jobs prove to be reliable etc.. I think option 2 is
worth thinking about, it wasn't the original plan but it would allow
us move away from a legacy system that is getting harder to support as
time goes on.
o RH2 was a loaned to us to allow this to happen so once we pick
either option above and complete the deployment of RH1 we'll have to
give it back

The OVB based cloud opens up a couple of interesting options to us
that we can explore if we were to stick with using OVB
1. Periodic scale test
  o With OVB its possible to select the number of nodes we place on
the provisioning network, for example while testing rh2 I was able to
deploy a overcloud with 80(we could do up to 120 on rh2 even higher on
rh1) compute nodes, doing this nightly when CI load is low would be an
extremely valuable test to run and gather data on.
2. Dev quota to reproduce CI
  o On OVB its now a lot easier to give somebody some quota to
reproduce exactly what CI is using in order to reproduce problems
etc... this was possible on rh1 but required a cloud admin to manually
take testenvs away from CI(it was manual and messy so we didn't do it
much)

The move doesn't come without its costs

1. tripleo-quickstart
  o Part of the tripleo-quickstart install is to first download a
prebuilt undercloud image that we were building in our periodic job.
Because the undercloud is now the CI slave we no longer build a
instack.qcow2 image. For the near future we can host the most recent
one on RH2(the IP will change so this needs to change in tripleo
quickstart or better still a DNS entry could be used so switch over
would be smother in future) but if we make the move to jobs of this
type permanent we'll no longer be generating this image for
quickstart. So we'll have to see if we can come to an alternative. We
could generate one in the periodic job but I'm not sure how we could
test it easily.

2. moving the current-tripleo pin
  o I havn't put in place yet anything needed for our periodic job to
move the current-tripleo pin, so until we get this done (and decide
what to do about 1. above) we're stuck on what ever pin we happen to
be on on Tuesday when rh1 is taken offline. The pin moved last night
to a repository from 2016-06-29 so we are at least reasonably up to
date. If it looks like the rh1 deployment is going to take an
excessive amount time we'll need to make this a priority.

3. The ability to telnet to CI slaves to get the console for running
CI jobs doesn't work on RH2  jobs, this is because its is using the
same port number(8088) we use in tripleo for ironic to serve its iPXE
images over http. So I've had to kill the console serving process
until we solve this. If we want to fix this we'll have to explore
changing the port number in either tripleo or infra.

I was putting together a screencast of how rh2 was deployed(with RDO
mitaka) but after several hours of editing the screen casts into
something usable the software I was using(openshot) refused to
generate what I had put together, in fact it crashed a lot, so if
anybody has any good suggestions of software I could use I'll try
again.

If I've missed 

[openstack-dev] [TripleO] Delopment environment census refresh

2016-06-14 Thread Derek Higgins
Hi All,
A while back we populated a etherpad[1] with details of our
individual development environments, the idea here was that we can
point newcomers to this page to give them an idea of what hard ware
its possible to deploy triple on and what tools we use drive it.

   Its been just over 3 months since then which isn't a huge
time-frame but things may have changed e.g. resource requirements may
have increased, more people may now be using OVB, tripleo quick start
might have more use then it did back then, and some people might even
have started deploying on unmodified public clouds since this has
started to become possible(although still in the early stages).

   I'd appreciate it if people took a look at[1] below, add a section
for yourself under June 2016, if nothing has changed then just
copy/paste your entry from March.

thanks,
Derek.

[1] https://etherpad.openstack.org/p/tripleo-dev-env-census

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Proposal for a new tool: dlrn-repo

2016-06-13 Thread Derek Higgins
On 13 June 2016 at 21:29, Ben Nemec  wrote:
> So our documented repo setup steps are three curls, a sed, and a
> multi-line bash command.  And the best part?  That's not even what we
> test.  The commands we actually use in tripleo.sh --repo-setup consist
> of the following: three curls, four seds, and (maybe) the same
> multi-line bash command.  Although whether that big list of packages in
> includepkgs is actually up to date with what we're testing is anybody's
> guess because without actually plugging both into a diff tool you
> probably can't visually find any differences.

Looking at the docs I think we should remove the list of packages
altogether, what we document for people trying to use tripleo should
only include the current-tripleo and deps repositories, as we know
this has passed a periodic CI job. This would reduce the documented
process too just 2 curls. The only place we need to worry about
pulling certain packages from /current is in CI and for devs who need
the absolute most up to date tripleo packages in these two cases
tripleo.sh should be used.


> What is my point?  That this whole process is overly complicated and
> error-prone.  If you miss one of those half dozen plus commands you're
> going to end up with a broken repo setup.  As one of the first things
> that a new user has to do in TripleO, this is a pretty poor introduction
> to the project.

Yup, couldn't agree more here, the simpler we can make things for a
new user the better

>
> My proposal is an rdo-release-esque project that will handle the repo
> setup for you, except that since dlrn doesn't really deal in releases I
> think the -repo name makes more sense.  Here's a first pass at such a
> tool: https://github.com/cybertron/dlrn-repo
>
> This would reduce the existing commands in tripleo.sh from:
> sudo sed -i -e 's%priority=.*%priority=30%' $REPO_PREFIX/delorean-deps.repo
> sudo curl -o $REPO_PREFIX/delorean.repo
> $DELOREAN_REPO_URL/$DELOREAN_REPO_FILE
> sudo sed -i -e 's%priority=.*%priority=20%' $REPO_PREFIX/delorean.repo
> sudo curl -o $REPO_PREFIX/delorean-current.repo
> http://trunk.rdoproject.org/centos7/current/delorean.repo
> sudo sed -i -e 's%priority=.*%priority=10%'
> $REPO_PREFIX/delorean-current.repo
> sudo sed -i 's/\[delorean\]/\[delorean-current\]/'
> $REPO_PREFIX/delorean-current.repo
> sudo /bin/bash -c "cat <<-EOF>>$REPO_PREFIX/delorean-current.repo
> includepkgs=diskimage-builder,instack,instack-undercloud,os-apply-config,os-cloud-config,os-collect-config,os-net-config,os-refresh-config,python-tripleoclient,tripleo-common,openstack-tripleo-heat-templates,openstack-tripleo-image-elements,openstack-tripleo,openstack-tripleo-puppet-elements
> EOF"
> sudo yum -y install yum-plugin-priorities
>
> to:
> sudo yum install -y http://tripleo.org/dlrn-repo.rpm # or wherever
> sudo dlrn-repo tripleo-current
>
> As you can see in the readme it also supports the stable branch repos or
> running against latest master of everything.
>
> Overall I think this is clearly a better user experience, and as an
> added bonus it would allow us to use the exact same code for repo
> management on the user side and in CI, which we can't have with a
> developer-specific tool like tripleo.sh.
>
> There's plenty left to do before this would be fully integrated (import
> to TripleO, package, update docs, update CI), so I wanted to solicit
> some broader input before pursuing it further.

I'm a little on the fence about this, I think the main problem you
bring up is the duplication of the includepkgs list, which I think we
can just remove from the docs, so whats left is the ugly blurb of
script in tripleo.sh --repo-setup, using a tool to do this certainly
improves the code, but does the creation of a new project complicate
things in its own way?

If we do go ahead with this the one suggestion I would have is
s/dlrn/trunk/g
delorean is the tool used to create trunk repositories, we shouldn't
care, it may even change some day, we are just dealing with trunk
repositories

>
> Thanks.
>
> -Ben
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Austin summit - session recap/summary

2016-05-19 Thread Derek Higgins
On 19 May 2016 5:38 pm, "Paul Belanger" <pabelan...@redhat.com> wrote:
>
> On Thu, May 19, 2016 at 03:50:15PM +0100, Derek Higgins wrote:
> > On 18 May 2016 at 13:34, Paul Belanger <pabelan...@redhat.com> wrote:
> > > On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:
> > >> On 6 May 2016 at 14:18, Paul Belanger <pabelan...@redhat.com> wrote:
> > >> > On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
> > >> >> Hi all,
> > >> >>
> > >> >> Some folks have requested a summary of our summit sessions, as
has been
> > >> >> provided for some other projects.
> > >> >>
> > >> >> I'll probably go into more detail on some of these topics either
via
> > >> >> subsequent more focussed threads an/or some blog posts but what
follows is
> > >> >> an overview of our summit sessions[1] with notable actions or
decisions
> > >> >> highlighted.  I'm including some of my own thoughts and
conclusions, folks
> > >> >> are welcome/encouraged to follow up with their own clarifications
or
> > >> >> different perspectives :)
> > >> >>
> > >> >> TripleO had a total of 5 sessions in Austin I'll cover them
one-by-one:
> > >> >>
> > >> >> -
> > >> >> Upgrades - current status and roadmap
> > >> >> -
> > >> >>
> > >> >> In this session we discussed the current state of upgrades -
initial
> > >> >> support for full major version upgrades has been implemented, but
the
> > >> >> implementation is monolithic, highly coupled to pacemaker, and
inflexible
> > >> >> with regard to third-party extraconfig changes.
> > >> >>
> > >> >> The main outcomes were that we will add support for more granular
> > >> >> definition of the upgrade lifecycle to the new composable
services format,
> > >> >> and that we will explore moving towards the proposed lightweight
HA
> > >> >> architecture to reduce the need for so much pacemaker specific
logic.
> > >> >>
> > >> >> We also agreed that investigating use of mistral to drive upgrade
workflows
> > >> >> was a good idea - currently we have a mixture of scripts combined
with Heat
> > >> >> to drive the upgrade process, and some refactoring into discrete
mistral
> > >> >> workflows may provide a more maintainable solution.  Potential
for using
> > >> >> the existing SoftwareDeployment approach directly via mistral
(outside of
> > >> >> the heat templates) was also discussed as something to be further
> > >> >> investigated and prototyped.
> > >> >>
> > >> >> We also touched on the CI implications of upgrades - we've got an
upgrades
> > >> >> job now, but we need to ensure coverage of full
release-to-release upgrades
> > >> >> (not just commit to commit).
> > >> >>
> > >> >> ---
> > >> >> Containerization status/roadmap
> > >> >> ---
> > >> >>
> > >> >> In this session we discussed the current status of containers in
TripleO
> > >> >> (which is to say, the container based compute node which deploys
containers
> > >> >> via Heat onto an an Atomic host node that is also deployed via
Heat), and
> > >> >> what strategy is most appropriate to achieve a fully
containerized TripleO
> > >> >> deployment.
> > >> >>
> > >> >> Several folks from Kolla participated in the session, and there
was
> > >> >> significant focus on where work may happen such that further
collaboration
> > >> >> between communities is possible.  To some extent this discussion
on where
> > >> >> (as opposed to how) proved a distraction and prevented much
discussion on
> > >> >> supportable architectural implementation for TripleO, thus what
follows is
> > >> >> mostly my perspective on the issues that exist:
> > >> >>
> > >> >> Significant uncertainty exists wrt integration between Kolla and
TripleO -
> > >> >> there's largely consensus that we want to consume the container
images

Re: [openstack-dev] [TripleO] Austin summit - session recap/summary

2016-05-19 Thread Derek Higgins
On 18 May 2016 at 13:34, Paul Belanger <pabelan...@redhat.com> wrote:
> On Wed, May 18, 2016 at 12:22:55PM +0100, Derek Higgins wrote:
>> On 6 May 2016 at 14:18, Paul Belanger <pabelan...@redhat.com> wrote:
>> > On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
>> >> Hi all,
>> >>
>> >> Some folks have requested a summary of our summit sessions, as has been
>> >> provided for some other projects.
>> >>
>> >> I'll probably go into more detail on some of these topics either via
>> >> subsequent more focussed threads an/or some blog posts but what follows is
>> >> an overview of our summit sessions[1] with notable actions or decisions
>> >> highlighted.  I'm including some of my own thoughts and conclusions, folks
>> >> are welcome/encouraged to follow up with their own clarifications or
>> >> different perspectives :)
>> >>
>> >> TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:
>> >>
>> >> -
>> >> Upgrades - current status and roadmap
>> >> -
>> >>
>> >> In this session we discussed the current state of upgrades - initial
>> >> support for full major version upgrades has been implemented, but the
>> >> implementation is monolithic, highly coupled to pacemaker, and inflexible
>> >> with regard to third-party extraconfig changes.
>> >>
>> >> The main outcomes were that we will add support for more granular
>> >> definition of the upgrade lifecycle to the new composable services format,
>> >> and that we will explore moving towards the proposed lightweight HA
>> >> architecture to reduce the need for so much pacemaker specific logic.
>> >>
>> >> We also agreed that investigating use of mistral to drive upgrade 
>> >> workflows
>> >> was a good idea - currently we have a mixture of scripts combined with 
>> >> Heat
>> >> to drive the upgrade process, and some refactoring into discrete mistral
>> >> workflows may provide a more maintainable solution.  Potential for using
>> >> the existing SoftwareDeployment approach directly via mistral (outside of
>> >> the heat templates) was also discussed as something to be further
>> >> investigated and prototyped.
>> >>
>> >> We also touched on the CI implications of upgrades - we've got an upgrades
>> >> job now, but we need to ensure coverage of full release-to-release 
>> >> upgrades
>> >> (not just commit to commit).
>> >>
>> >> ---
>> >> Containerization status/roadmap
>> >> ---
>> >>
>> >> In this session we discussed the current status of containers in TripleO
>> >> (which is to say, the container based compute node which deploys 
>> >> containers
>> >> via Heat onto an an Atomic host node that is also deployed via Heat), and
>> >> what strategy is most appropriate to achieve a fully containerized TripleO
>> >> deployment.
>> >>
>> >> Several folks from Kolla participated in the session, and there was
>> >> significant focus on where work may happen such that further collaboration
>> >> between communities is possible.  To some extent this discussion on where
>> >> (as opposed to how) proved a distraction and prevented much discussion on
>> >> supportable architectural implementation for TripleO, thus what follows is
>> >> mostly my perspective on the issues that exist:
>> >>
>> >> Significant uncertainty exists wrt integration between Kolla and TripleO -
>> >> there's largely consensus that we want to consume the container images
>> >> defined by the Kolla community, but much less agreement that we can
>> >> feasably switch to the ansible-orchestrated deployment/config flow
>> >> supported by Kolla without breaking many of our primary operator 
>> >> interfaces
>> >> in a fundamentally unacceptable way, for example:
>> >>
>> >> - The Mistral based API is being implemented on the expectation that the
>> >>   primary interface to TripleO deployments is a parameters schema exposed
>> >>   by a series of Heat templates - this is no longer true in a "split 
>> >> stack"
>> >>   model where we hav

Re: [openstack-dev] [TripleO] Austin summit - session recap/summary

2016-05-18 Thread Derek Higgins
On 6 May 2016 at 14:18, Paul Belanger  wrote:
> On Tue, May 03, 2016 at 05:34:55PM +0100, Steven Hardy wrote:
>> Hi all,
>>
>> Some folks have requested a summary of our summit sessions, as has been
>> provided for some other projects.
>>
>> I'll probably go into more detail on some of these topics either via
>> subsequent more focussed threads an/or some blog posts but what follows is
>> an overview of our summit sessions[1] with notable actions or decisions
>> highlighted.  I'm including some of my own thoughts and conclusions, folks
>> are welcome/encouraged to follow up with their own clarifications or
>> different perspectives :)
>>
>> TripleO had a total of 5 sessions in Austin I'll cover them one-by-one:
>>
>> -
>> Upgrades - current status and roadmap
>> -
>>
>> In this session we discussed the current state of upgrades - initial
>> support for full major version upgrades has been implemented, but the
>> implementation is monolithic, highly coupled to pacemaker, and inflexible
>> with regard to third-party extraconfig changes.
>>
>> The main outcomes were that we will add support for more granular
>> definition of the upgrade lifecycle to the new composable services format,
>> and that we will explore moving towards the proposed lightweight HA
>> architecture to reduce the need for so much pacemaker specific logic.
>>
>> We also agreed that investigating use of mistral to drive upgrade workflows
>> was a good idea - currently we have a mixture of scripts combined with Heat
>> to drive the upgrade process, and some refactoring into discrete mistral
>> workflows may provide a more maintainable solution.  Potential for using
>> the existing SoftwareDeployment approach directly via mistral (outside of
>> the heat templates) was also discussed as something to be further
>> investigated and prototyped.
>>
>> We also touched on the CI implications of upgrades - we've got an upgrades
>> job now, but we need to ensure coverage of full release-to-release upgrades
>> (not just commit to commit).
>>
>> ---
>> Containerization status/roadmap
>> ---
>>
>> In this session we discussed the current status of containers in TripleO
>> (which is to say, the container based compute node which deploys containers
>> via Heat onto an an Atomic host node that is also deployed via Heat), and
>> what strategy is most appropriate to achieve a fully containerized TripleO
>> deployment.
>>
>> Several folks from Kolla participated in the session, and there was
>> significant focus on where work may happen such that further collaboration
>> between communities is possible.  To some extent this discussion on where
>> (as opposed to how) proved a distraction and prevented much discussion on
>> supportable architectural implementation for TripleO, thus what follows is
>> mostly my perspective on the issues that exist:
>>
>> Significant uncertainty exists wrt integration between Kolla and TripleO -
>> there's largely consensus that we want to consume the container images
>> defined by the Kolla community, but much less agreement that we can
>> feasably switch to the ansible-orchestrated deployment/config flow
>> supported by Kolla without breaking many of our primary operator interfaces
>> in a fundamentally unacceptable way, for example:
>>
>> - The Mistral based API is being implemented on the expectation that the
>>   primary interface to TripleO deployments is a parameters schema exposed
>>   by a series of Heat templates - this is no longer true in a "split stack"
>>   model where we have to hand off to an alternate service orchestration tool.
>>
>> - The tripleo-ui (based on the Mistral based API) consumes heat parameter
>>   schema to build it's UI, and Ansible doesn't support the necessary
>>   parameter schema definition (such as types and descriptions) to enable
>>   this pattern to be replicated.  Ansible also doesn't provide a HTTP API,
>>   so we'd still have to maintain and API surface for the (non python) UI to
>>   consume.
>>
>> We also discussed ideas around integration with kubernetes (a hot topic on
>> the Kolla track this summit), but again this proved inconclusive beyond
>> that yes someone should try developing a PoC to stimulate further
>> discussion.  Again, significant challenges exist:
>>
>> - We still need to maintain the Heat parameter interfaces for the API/UI,
>>   and there is also a strong preference to maintain puppet as a tool for
>>   generating service configuration (so that existing operator integrations
>>   via puppet continue to function) - this is a barrier to directly
>>   consuming the kolla-kubernetes effort directly.
>>
>> - A COE layer like kubernetes is a poor fit for deployments where operators
>>   require strict control of service placement (e.g exactly which nodes a 
>> service
>>   runs on, IP address assignments to specific nodes 

Re: [openstack-dev] [tripleo] Reason for installing devstack in CI pipeline

2016-05-18 Thread Derek Higgins
On 16 May 2016 at 18:16, Paul Belanger  wrote:
> Greetings,
>
> Over the last few weeks I've been head deep into understanding the TripleO CI
> pipeline.  For the most part, I am happy that we have merged centos-7 DIB
> support and I'm working to migrate the jobs to it.
>
> Something I have been trying to figure out, is why does the pipeline install
> devstack?  I cannot see anything currently in the toci_gate.sh script that is
> referencing devstack.  Everything seems to be related to launching the 
> external
> node.
>
> So, my question is, what is devstack doing?

Can you elaborate what you mean when you say we're installing
devstack, we currently make use of some of the devstack-gate scripts.
The main need  for this is so it clones the correct version of each
project to /opt/stack/new, the tripleo ci job then looks at
ZUUL_CHANGES[1] to get a list of projects being tested and uses those
git repositories to build rpm packages from them. The rest of the CI
job then uses these rpm's (layered on top of rdo trunk repositorys) so
they end up installed where appropriate.

[1] - 
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/toci_instack.sh#n103
>
> ---
> Paul
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][infra] HW upgrade for tripleo CI

2016-05-11 Thread Derek Higgins
On 11 May 2016 at 10:24, Derek Higgins <der...@redhat.com> wrote:
> Hi All,
> I'll be taking down the tripleo cloud today for the hardware
> upgrade, The cloud will be take down at about 1PM UTC and I'll start
> bringing it back up once dcops have finished installing the new HW. We
> should have everything back up an running for tomorrow.

Unfortunately we didn't get this completed tonight and will have to
pick up on it tomorrow. During the upgrade one of the hosts failed to
get passed its POST process, the host in question is the bastion
server which routes all traffic to the cloud and is our jump host for
access to the other servers. We'll be picking up on it tomorrow when
other members of the lab team are available to assist.

Thanks and sorry for the extended disruption,
Derek.

>
> thanks,
> Derek.
>
> On 6 May 2016 at 15:36, Derek Higgins <der...@redhat.com> wrote:
>> Hi All,
>>the long awaited RAM and SSD's have arrived for the tripleo rack,
>> I'd like to schedule a time next week to do the install which will
>> involve and outage window. We could attempt to do it node by node but
>> the controller needs to come down at some stage anyways and doing
>> other nodes groups at a time will take all day as we would have to
>> wait for jobs to finish on each one as we go along.
>>
>>I'm suggesting we do it on one of Monday(maybe a little soon at
>> this stage), Wednesday or Friday (mainly because those best suit me),
>> has anybody any suggestions why one day would be better over the
>> others?
>>
>>The other option is that we do nothing until the Rack is moved
>> later in the summer but the exact timing of this is now up in the air
>> a little so I think its best we just bite the bullet and do this ASAP
>> without waiting.
>>
>> thanks,
>> Derek.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][infra] HW upgrade for tripleo CI

2016-05-11 Thread Derek Higgins
Hi All,
I'll be taking down the tripleo cloud today for the hardware
upgrade, The cloud will be take down at about 1PM UTC and I'll start
bringing it back up once dcops have finished installing the new HW. We
should have everything back up an running for tomorrow.

thanks,
Derek.

On 6 May 2016 at 15:36, Derek Higgins <der...@redhat.com> wrote:
> Hi All,
>the long awaited RAM and SSD's have arrived for the tripleo rack,
> I'd like to schedule a time next week to do the install which will
> involve and outage window. We could attempt to do it node by node but
> the controller needs to come down at some stage anyways and doing
> other nodes groups at a time will take all day as we would have to
> wait for jobs to finish on each one as we go along.
>
>I'm suggesting we do it on one of Monday(maybe a little soon at
> this stage), Wednesday or Friday (mainly because those best suit me),
> has anybody any suggestions why one day would be better over the
> others?
>
>The other option is that we do nothing until the Rack is moved
> later in the summer but the exact timing of this is now up in the air
> a little so I think its best we just bite the bullet and do this ASAP
> without waiting.
>
> thanks,
> Derek.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][infra] HW upgrade for tripleo CI

2016-05-11 Thread Derek Higgins
On 6 May 2016 at 15:57, Ben Nemec <openst...@nemebean.com> wrote:
> \o/
>
> On 05/06/2016 09:36 AM, Derek Higgins wrote:
>> Hi All,
>>the long awaited RAM and SSD's have arrived for the tripleo rack,
>> I'd like to schedule a time next week to do the install which will
>> involve and outage window. We could attempt to do it node by node but
>> the controller needs to come down at some stage anyways and doing
>> other nodes groups at a time will take all day as we would have to
>> wait for jobs to finish on each one as we go along.
>>
>>I'm suggesting we do it on one of Monday(maybe a little soon at
>> this stage), Wednesday or Friday (mainly because those best suit me),
>> has anybody any suggestions why one day would be better over the
>> others?
>
> I would probably suggest Monday or Friday, since it usually seems like
> CI is the quietest at the ends of the week.
As it turned out I can no longer do friday so we're going to do it
today. Monday would probably have been a little better but lets take
the chance now while dcops have time available.

>
>>
>>The other option is that we do nothing until the Rack is moved
>> later in the summer but the exact timing of this is now up in the air
>> a little so I think its best we just bite the bullet and do this ASAP
>> without waiting.
>
> Yeah, that's still a couple of months off, and it seems silly to have
> all that hardware sitting around unused for so long.
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][infra] HW upgrade for tripleo CI

2016-05-11 Thread Derek Higgins
On 6 May 2016 at 17:50, James Slagle <james.sla...@gmail.com> wrote:
> On Fri, May 6, 2016 at 10:36 AM, Derek Higgins <der...@redhat.com> wrote:
>> Hi All,
>>the long awaited RAM and SSD's have arrived for the tripleo rack,
>> I'd like to schedule a time next week to do the install which will
>> involve and outage window. We could attempt to do it node by node but
>> the controller needs to come down at some stage anyways and doing
>> other nodes groups at a time will take all day as we would have to
>> wait for jobs to finish on each one as we go along.
>>
>>I'm suggesting we do it on one of Monday(maybe a little soon at
>> this stage), Wednesday or Friday (mainly because those best suit me),
>> has anybody any suggestions why one day would be better over the
>> others?
>
> +1 to going ahead with the upgrades now. I don't have much preference
> for either day...since you're going to be the one doing most of the
> work, I'd say pick the day that works best for you. I can plan on
> being available to help out if you need to hand anything over. Should
> we coordinate via an etherpad or anything?

Thanks this is going ahead this afternoon, I'll be bringing down all
the tripleo cloud at around 1pm UTC, and will bring it back up once
dcops have completed the HW upgrade. I'v been jotting down the steps
we need to coordinate on an etherpad, I'll send you the details this
afternoon.


>
>
>
>
> --
> -- James Slagle
> --
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO][infra] HW upgrade for tripleo CI

2016-05-06 Thread Derek Higgins
Hi All,
   the long awaited RAM and SSD's have arrived for the tripleo rack,
I'd like to schedule a time next week to do the install which will
involve and outage window. We could attempt to do it node by node but
the controller needs to come down at some stage anyways and doing
other nodes groups at a time will take all day as we would have to
wait for jobs to finish on each one as we go along.

   I'm suggesting we do it on one of Monday(maybe a little soon at
this stage), Wednesday or Friday (mainly because those best suit me),
has anybody any suggestions why one day would be better over the
others?

   The other option is that we do nothing until the Rack is moved
later in the summer but the exact timing of this is now up in the air
a little so I think its best we just bite the bullet and do this ASAP
without waiting.

thanks,
Derek.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][CI] Ability to reproduce failures

2016-04-13 Thread Derek Higgins
On 13 April 2016 at 09:58, Steven Hardy  wrote:
> On Tue, Apr 12, 2016 at 11:08:28PM +0200, Gabriele Cerami wrote:
>> On Fri, 2016-04-08 at 16:18 +0100, Steven Hardy wrote:
>>
>> > Note we're not using devtest at all anymore, the developer script
>> > many
>> > folks use is tripleo.sh:
>>
>> So, I followed the flow of the gate jobs starting from jenkins builder
>> script, and it seems like it's using devtest (or maybe something I
>> consider to be devtest but it's not, is devtest the part that creates
>> some environments, wait for them to be locked by gearman, and so on ?)
>
> So I think the confusion may step from the fact ./docs/TripleO-ci.rst is
> out of date.  Derek can confirm, but I think although there may be a few
> residual devtest pieces associated with managing the testenv VMs, there's
> nothing related to devtest used in the actual CI run itself anymore.
>
> See this commit:
>
> https://github.com/openstack-infra/tripleo-ci/commit/a85deb848007f0860ac32ac0096c5e45fe899cc5
>
> Since then we've moved to using tripleo.sh to drive most steps of the CI
> run, and many developers are using it also.  Previously the same was true
> of the devtest.sh script in tripleo-incubator, but now that is totally
> deprecated and unused (that it still exists in the repo is an oversight).
Devtest was used to generate the images that host the testenvs and we
should be soon getting rid of this. So I wouldn't spend a lot of time
looking at devtest it isn't used during the actual CI runs'

>
>> What I meant with "the script I'm using (created by Sagi) is not
>> creating the same enviroment" is that is not using the same test env
>> (with gearman and such) that the ci scripts are currently using.
>
> Sure, I guess my point is that for 99% of issues, the method used to create
> the VM is not important.  We use a slightly different method in CI to
> manage the VMs than in most developer environments, but if the requirement
> is to reproduce CI failures, you mostly care about deploying the exact same
> software, not so much how virsh was driven to create the VMs.

Yes, most of our errors can be reproduced outside of the CI
environment, but there are cases where we need an identical setup
(specifically for some transient errors), what we should do I think is
set aside a single test env host for these errors for anybody
debugging transient errors. I'll can get something together for this.
It wont be for general purpose developing but instead should be
limited for use only when debugging transient errors that don't
reproduce elsewhere.

>
> Thanks for digging into this, it's great to have some fresh eyes
> highlighting these sorts of issues! :)
Yes, its been a while since we had a updodate description of how
everything ties together so thanks for this. I've read through it and
added a few comments but in general what you have documented is on the
right track.


>
> Steve
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][CI] Ability to reproduce failures

2016-04-08 Thread Derek Higgins
On 7 April 2016 at 22:03, Gabriele Cerami  wrote:
> Hi,
>
> I'm trying to find an entry point to join the effort in TripleO CI.
Hi Gabriele, welcome aboard

> I studied the infrastructure and the scripts, but there's still something I'm 
> missing.
> The last step of studying the complex landscape of TripleO CI and the first 
> to start contributing
> is being able to reproduce failures in an accessible environment, to start 
> debugging issues.
> I have not found an easy and stable way to do this. Jobs are certainly 
> gathering
> a lot of logs, but that's not enough.
>
> At the moment, I started launching periodic jobs on my local test box using
> this script
> https://github.com/sshnaidm/various/blob/master/tripleo_repr.sh
>
> It's quite handy, but I'm not sure it's able to produce perfectly compatible
> environments with what's in CI.

Great, I haven't tried to run it but at quick glance this looks like
your doing most of main steps that are needed to mimic CI, I haven't
seen anything that is obviously missing what kind of differences are
you seeing in the results when compared to CI?

>
> Can anyone suggest a way to make jobs reproducible locally? I know it may be 
> complicated
> to setup an environment through devtest, but may If we can start with just a 
> list of steps,
> then it would be easier to put them into a script, hten make it availabe in 
> the log in place
> of the current reproduce.sh that is not very useful.

So, I think the problem with reproduce.sh is that nobody in tripleo
has ever used it, and as a result toci_gate_test.sh and
toci_instack.sh just aren't compatible with it, I'd suggest we change
the the toci_* scripts so that they play nice together. I'll see if I
can give it a whirl over the next few days and see what problem we're
likely to hit.

>
> thanks for any feedback.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] becoming third party CI

2016-03-21 Thread Derek Higgins
On 17 March 2016 at 16:59, Ben Nemec  wrote:
> On 03/10/2016 05:24 PM, Jeremy Stanley wrote:
>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote:
>>> This seems to be the week people want to pile it on TripleO. Talking
>>> about upstream is great but I suppose I'd rather debate major changes
>>> after we branch Mitaka. :/
>> [...]
>>
>> I didn't mean to pile on TripleO, nor did I intend to imply this was
>> something which should happen ASAP (or even necessarily at all), but
>> I do want to better understand what actual benefit is currently
>> derived from this implementation vs. a more typical third-party CI
>> (which lots of projects are doing when they find their testing needs
>> are not met by the constraints of our generic test infrastructure).
>>
>>> With regards to Jenkins restarts I think it is understood that our job
>>> times are long. How often do you find infra needs to restart Jenkins?
>>
>> We're restarting all 8 of our production Jenkins masters weekly at a
>> minimum, but generally more often when things are busy (2-3 times a
>> week). For many months we've been struggling with a thread leak for
>> which their development team has not seen as a priority to even
>> triage our bug report effectively. At this point I think we've
>> mostly given up on expecting it to be solved by anything other than
>> our upcoming migration off of Jenkins, but that's another topic
>> altogether.
>>
>>> And regardless of that what if we just said we didn't mind the
>>> destructiveness of losing a few jobs now and then (until our job
>>> times are under the line... say 1.5 hours or so). To be clear I'd
>>> be fine with infra pulling the rug on running jobs if this is the
>>> root cause of the long running jobs in TripleO.
>>
>> For manual Jenkins restarts this is probably doable (if additional
>> hassle), but I don't know whether that's something we can easily
>> shoehorn into our orchestrated/automated restarts.
>>
>>> I think the "benefits are minimal" is bit of an overstatement. The
>>> initial vision for TripleO CI stands and I would still like to see
>>> individual projects entertain the option to use us in their gates.
>> [...]
>>
>> This is what I'd like to delve deeper into. The current
>> implementation isn't providing you with any mechanism to prevent
>> changes which fail jobs running in the tripleo-test cloud from
>> merging to your repos, is it? You're still having to manually
>> inspect the job results posted by it? How is that particularly
>> different from relying on third-party CI integration?
>>
>> As for other projects making use of the same jobs, right now the
>> only convenience I'm aware of is that they can add check-tripleo
>> pipeline jobs in our Zuul layout file instead of having you add it
>> to yours (which could itself reside in a Git repo under your
>> control, giving you even more flexibility over those choices). In
>> fact, with a third-party CI using its own separate Gerrit account,
>> you would be able to leave clear -1/+1 votes on check results which
>> is not possible with the present solution.
>>
>> So anyway, I'm not saying that I definitely believe the third-party
>> CI route will be better for TripleO, but I'm not (yet) clear on what
>> tangible benefit you're receiving now that you lose by switching to
>> that model.
>>
>
> FWIW, I think third-party CI probably makes sense for TripleO.
> Practically speaking we are third-party CI right now - we run our own
> independent hardware infrastructure, we aren't multi-region, and we
> can't leave a vote on changes.  Since the first two aren't likely to
> change any time soon (although I believe it's still a long-term goal to
> get to a place where we can run in regular infra and just contribute our
> existing CI hardware to the general infra pool, but that's still a long
> way off), and moving to actual third-party CI would get us the ability
> to vote, I think it's worth pursuing.
>
> As an added bit of fun, we have a forced move of our CI hardware coming
> up in the relatively near future, and if we don't want to have multiple
> days (and possibly more, depending on how the move goes) of TripleO CI
> outage we're probably going to need to stand up a new environment in
> parallel anyway.  If we're doing that it might make sense to try hooking
> it in through the third-party infra instead of the way we do it today.
> Hopefully that would allow us to work out the kinks before the old
> environment goes away.
>
> Anyway, I'm sure we'll need a bunch more discussion about this, but I
> wanted to chime in with my two cents.

We need to answer this question soon, I'm currently working on the CI
parts that we need in order of move to OVB[1] and was assuming we
would be maintaining the status quo. What we end up doing would look
very different if we move to 3rd party CI, if using 3rd party CI we
can simply start a vanilla centos instance at use it as an undercloud.
It can then create its own baremetal 

Re: [openstack-dev] [TripleO] propose trown for core

2016-03-21 Thread Derek Higgins
On 20 March 2016 at 18:32, Dan Prince  wrote:
> I'd like to propose that we add John Trowbridge to the TripleO core
> review team. John has become one of the goto guys in helping to chase
> down upstream trunk chasing issues. He has contributed a lot to helping
> keep general CI issues running and been involved with several new
> features over the past year around node introspection, etc. His
> involvement with the RDO team also gives him a healthy prospective
> about sane releasing practices, etc.
>
> John doesn't have the highest TripleO review stats ATM but I expect his
> stats to continue to climb. Especially with his work on upcoming
> improvements like tripleo-quickstart, etc. Having John on board the
> core team would help drive these projects and it would also be great to
> have him able to land fixes related to trunk chasing, etc. I expect
> he'll gradually jump into helping with other TripleO projects as well.
>
> If you agree please +1. If there is no negative feedback I'll add him
> next Monday.

+1 from me

>
> Dan
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] propose EmilienM for core

2016-03-21 Thread Derek Higgins
On 20 March 2016 at 18:22, Dan Prince  wrote:
> I'd like to propose that we add Emilien Macchi to the TripleO core
> review team. Emilien has been getting more involved with TripleO during
> this last release. In addition to help with various Puppet things he
> also has experience in building OpenStack installation tooling,
> upgrades, and would be a valuable prospective to the core team. He has
> also added several new features around monitoring into instack-
> undercloud.
>
> Emilien is currently acting as the Puppet PTL. Adding him to the
> TripleO core review team could help us move faster towards some of the
> upcoming features like composable services, etc.
>
> If you agree please +1. If there is no negative feedback I'll add him
> next Monday.
+1
>
> Dan
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] rabbitmq / ipv6 issue

2016-03-16 Thread Derek Higgins
On 16 March 2016 at 02:41, Emilien Macchi  wrote:
> I did some testing again and I'm still running in curl issues:
> http://paste.openstack.org/show/BU7UY0mUrxoMUGDhXgWs/
>
> I'll continue investigation tomorrow.

btw, tripleo-ci seems to be doing reasonably well this morning, I
don't see any failures over the last few hours so the problem your
seeing looks to be something that isn't a problem in all cases


>
> On Tue, Mar 15, 2016 at 8:00 PM, Emilien Macchi  wrote:
>> Both Pull-requests got merged upstream (kudos to Puppetlabs).
>>
>> I rebased https://review.openstack.org/#/c/289445/ on master and
>> abandoned the pin. Let's see how CI works now.
>> If it still does not work, feel free to restore the pin and rebase
>> again on the pin, so we can make progress.
>>
>> On Tue, Mar 15, 2016 at 6:21 PM, Emilien Macchi  wrote:
>>> So this is an attempt to fix everything in Puppet modules:
>>>
>>> * https://github.com/puppetlabs/puppetlabs-stdlib/pull/577
>>> * https://github.com/puppetlabs/puppetlabs-rabbitmq/pull/443
>>>
>>> If we have the patches like this, there will be no need to patch TripleO.
>>>
>>> Please review the patches if needed,
>>> Thanks
>>>
>>> On Tue, Mar 15, 2016 at 1:57 PM, Emilien Macchi  wrote:
 So from now, we pin [5] puppetlabs-rabbitmq to the commit before [3]
 and I rebased Attila's patch to test CI again.
 This pin is a workaround, in the meantime we are working on a fix in
 puppetlabs-rabbitmq.

 [5] https://review.openstack.org/293074

 I also reported the issue in TripleO Launchpad:
 https://bugs.launchpad.net/tripleo/+bug/1557680

 Also a quick note:
 Puppet OpenStack CI did not detect this failure because we don't
 deploy puppetlabs-rabbitmq from master but from the latest release
 (tag).

 On Tue, Mar 15, 2016 at 1:17 PM, Emilien Macchi  wrote:
> TL;DR;This e-mail tracks down the work done to make RabbitMQ working
> on IPv6 deployments.
> It's currently broken and we might need to patch different Puppet
> modules to make it work.
>
> Long story:
>
> Attila Darazs is currently working on [1] to get IPv6 tested by
> TripleO CI but is stuck because a RabbitMQ issue in Puppet catalog
> [2], reported by Dan Sneddon.
> [1] https://review.openstack.org/#/c/289445
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1317693
>
> [2] is caused by a patch in puppetlabs-rabbitmq [3], that change the
> way we validate RabbitMQ is working from testing localhost to testing
> the actual binding IP.
> [3] 
> https://github.com/puppetlabs/puppetlabs-rabbitmq/commit/dac8de9d95c5771b7ef7596b73a59d4108138e3a
>
> The problem is that when testing the actual IPv6, it curls fails for
> some different reasons explained on [4] by Sofer.
> [4] https://review.openstack.org/#/c/292664/
>
> So we need to investigate puppetlabs-rabbitmq and puppet-staging to
> see if whether or not we need to change something there.
> For now, I don't think we need to patch anything in TripleO Heat
> Templates, but we'll see after the investigation.
>
> I'm currently working on this task, but any help is welcome,
> --
> Emilien Macchi



 --
 Emilien Macchi
>>>
>>>
>>>
>>> --
>>> Emilien Macchi
>>
>>
>>
>> --
>> Emilien Macchi
>
>
>
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] core cleanup

2016-03-15 Thread Derek Higgins
On 14 March 2016 at 14:26, Dan Prince  wrote:
> Looking at the stats for the last 180 days I'd like to propose we
> cleanup TripleO core a bit:
>
> http://russellbryant.net/openstack-stats/tripleo-reviewers-180.txt
>
> There are a few reviewers with low numbers of reviews (just added to
> the tripleo-ui team, jtomasek and flfuchs). I think those will ramp up
> soon as the UI repos become part of upstream.
>
> For the rest of the numbers I roughly looked at those below the 100
> reviews mark. So I'd like to propose we cut the following users from
> TripleO core:
>
> stevenk
> jprovazn
> tomas-8c8
> tchaypo
> ghe.rivero
> lsmola
> pblaho
> cmsj
>
> 
>
> If existing TripleO core members could review the above list and +/- 1
> over the next week that would be great so we can proceed with the
> cleanup.

+1

>
> Thanks,
>
> Dan
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] propose ejuaso for core

2016-03-15 Thread Derek Higgins
On 14 March 2016 at 14:38, Dan Prince  wrote:
> http://russellbryant.net/openstack-stats/tripleo-reviewers-180.txt
>
> Our top reviewer over the last half a year ejuaso (goes by Ozz for
> Osorio or jaosorior on IRC). His reviews seem consistent, he
> consistently attends the meetings and he chimes in on lots of things.
> I'd like to propose we add him to our core team (probably long overdue
> now too).
>
> If you agree please +1. If there is no negative feedback I'll add him
> next Monday.
+1

>
> Dan
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] CI jobs failures

2016-03-09 Thread Derek Higgins
On 9 March 2016 at 07:08, Richard Su <r...@redhat.com> wrote:
>
>
> On 03/08/2016 09:58 AM, Derek Higgins wrote:
>>
>> On 7 March 2016 at 18:22, Ben Nemec <openst...@nemebean.com> wrote:
>>>
>>> On 03/07/2016 11:33 AM, Derek Higgins wrote:
>>>>
>>>> On 7 March 2016 at 15:24, Derek Higgins <der...@redhat.com> wrote:
>>>>>
>>>>> On 6 March 2016 at 16:58, James Slagle <james.sla...@gmail.com> wrote:
>>>>>>
>>>>>> On Sat, Mar 5, 2016 at 11:15 AM, Emilien Macchi <emil...@redhat.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I'm kind of hijacking Dan's e-mail but I would like to propose some
>>>>>>> technical improvements to stop having so much CI failures.
>>>>>>>
>>>>>>>
>>>>>>> 1/ Stop creating swap files. We don't have SSD, this is IMHO a
>>>>>>> terrible
>>>>>>> mistake to swap on files because we don't have enough RAM. In my
>>>>>>> experience, swaping on non-SSD disks is even worst that not having
>>>>>>> enough RAM. We should stop doing that I think.
>>>>>>
>>>>>> We have been relying on swap in tripleo-ci for a little while. While
>>>>>> not ideal, it has been an effective way to at least be able to test
>>>>>> what we've been testing given the amount of physical RAM that is
>>>>>> available.
>>>>>
>>>>> Ok, so I have a few points here, in places where I'm making
>>>>> assumptions I'll try to point it out
>>>>>
>>>>> o Yes I agree using swap should be avoided if at all possible
>>>>>
>>>>> o We are currently looking into adding more RAM to our testenv hosts,
>>>>> it which point we can afford to be a little more liberal with Memory
>>>>> and this problem should become less of an issue, having said that
>>>>>
>>>>> o Even though using swap is bad, if we have some processes with a
>>>>> large Mem footprint that don't require constant access to a portion of
>>>>> the footprint swaping it out over the duration of the CI test isn't as
>>>>> expensive as it would suggest (assuming it doesn't need to be swapped
>>>>> back in and the kernel has selected good candidates to swap out)
>>>>>
>>>>> o The test envs that host the undercloud and overcloud nodes have 64G
>>>>> of RAM each, they each host 4 testenvs and each test env if running a
>>>>> HA job can use up to 21G of RAM so we have over committed there, it
>>>>> this is only a problem if a test env host gets 4 HA jobs that are
>>>>> started around the same time (and as a result a each have 4 overcloud
>>>>> nodes running at the same time), to allow this to happen without VM's
>>>>> being killed by the OOM we've also enabled swap there. The majority of
>>>>> the time this swap isn't in use, only if all 4 testenvs are being
>>>>> simultaneously used and they are all running the second half of a CI
>>>>> test at the same time.
>>>>>
>>>>> o The overcloud nodes are VM's running with a "unsafe" disk caching
>>>>> mechanism, this causes sync requests from guest to be ignored and as a
>>>>> result if the instances being hosted on these nodes are going into
>>>>> swap this swap will be cached on the host as long as RAM is available.
>>>>> i.e. swap being used in the undercloud or overcloud isn't being synced
>>>>> to the disk on the host unless it has to be.
>>>>>
>>>>> o What I'd like us to avoid is simply bumping up the memory every time
>>>>> we hit a OOM error without at least
>>>>>1. Explaining why we need more memory all of a sudden
>>>>>2. Looking into a way we may be able to avoid simply bumping the RAM
>>>>> (at peak times we are memory constrained)
>>>>>
>>>>> as an example, Lets take a look at the swap usage on the undercloud of
>>>>> a recent ci nonha job[1][2], These insances have 5G of RAM with 2G or
>>>>> swap enabled via a swapfile
>>>>> the overcloud deploy started @22:07:46 and finished at @22:28:06
>>>>>
>>>>> In the graph you'll see a spike in memory being swapped out around
>>&

Re: [openstack-dev] [tripleo] CI jobs failures

2016-03-08 Thread Derek Higgins
On 7 March 2016 at 18:22, Ben Nemec <openst...@nemebean.com> wrote:
> On 03/07/2016 11:33 AM, Derek Higgins wrote:
>> On 7 March 2016 at 15:24, Derek Higgins <der...@redhat.com> wrote:
>>> On 6 March 2016 at 16:58, James Slagle <james.sla...@gmail.com> wrote:
>>>> On Sat, Mar 5, 2016 at 11:15 AM, Emilien Macchi <emil...@redhat.com> wrote:
>>>>> I'm kind of hijacking Dan's e-mail but I would like to propose some
>>>>> technical improvements to stop having so much CI failures.
>>>>>
>>>>>
>>>>> 1/ Stop creating swap files. We don't have SSD, this is IMHO a terrible
>>>>> mistake to swap on files because we don't have enough RAM. In my
>>>>> experience, swaping on non-SSD disks is even worst that not having
>>>>> enough RAM. We should stop doing that I think.
>>>>
>>>> We have been relying on swap in tripleo-ci for a little while. While
>>>> not ideal, it has been an effective way to at least be able to test
>>>> what we've been testing given the amount of physical RAM that is
>>>> available.
>>>
>>> Ok, so I have a few points here, in places where I'm making
>>> assumptions I'll try to point it out
>>>
>>> o Yes I agree using swap should be avoided if at all possible
>>>
>>> o We are currently looking into adding more RAM to our testenv hosts,
>>> it which point we can afford to be a little more liberal with Memory
>>> and this problem should become less of an issue, having said that
>>>
>>> o Even though using swap is bad, if we have some processes with a
>>> large Mem footprint that don't require constant access to a portion of
>>> the footprint swaping it out over the duration of the CI test isn't as
>>> expensive as it would suggest (assuming it doesn't need to be swapped
>>> back in and the kernel has selected good candidates to swap out)
>>>
>>> o The test envs that host the undercloud and overcloud nodes have 64G
>>> of RAM each, they each host 4 testenvs and each test env if running a
>>> HA job can use up to 21G of RAM so we have over committed there, it
>>> this is only a problem if a test env host gets 4 HA jobs that are
>>> started around the same time (and as a result a each have 4 overcloud
>>> nodes running at the same time), to allow this to happen without VM's
>>> being killed by the OOM we've also enabled swap there. The majority of
>>> the time this swap isn't in use, only if all 4 testenvs are being
>>> simultaneously used and they are all running the second half of a CI
>>> test at the same time.
>>>
>>> o The overcloud nodes are VM's running with a "unsafe" disk caching
>>> mechanism, this causes sync requests from guest to be ignored and as a
>>> result if the instances being hosted on these nodes are going into
>>> swap this swap will be cached on the host as long as RAM is available.
>>> i.e. swap being used in the undercloud or overcloud isn't being synced
>>> to the disk on the host unless it has to be.
>>>
>>> o What I'd like us to avoid is simply bumping up the memory every time
>>> we hit a OOM error without at least
>>>   1. Explaining why we need more memory all of a sudden
>>>   2. Looking into a way we may be able to avoid simply bumping the RAM
>>> (at peak times we are memory constrained)
>>>
>>> as an example, Lets take a look at the swap usage on the undercloud of
>>> a recent ci nonha job[1][2], These insances have 5G of RAM with 2G or
>>> swap enabled via a swapfile
>>> the overcloud deploy started @22:07:46 and finished at @22:28:06
>>>
>>> In the graph you'll see a spike in memory being swapped out around
>>> 22:09, this corresponds almost exactly to when the overcloud image is
>>> being downloaded from swift[3], looking the top output at the end of
>>> the test you'll see that swift-proxy is using over 500M of Mem[4].
>>>
>>> I'd much prefer we spend time looking into why the swift proxy is
>>> using this much memory rather then blindly bump the memory allocated
>>> to the VM, perhaps we have something configured incorrectly or we've
>>> hit a bug in swift.
>>>
>>> Having said all that we can bump the memory allocated to each node but
>>> we have to accept 1 of 2 possible consequences
>>> 1. We'll env up using the swap on the testenv hosts more then we
>>> currently are or
>>> 2. We'

Re: [openstack-dev] [tripleo] CI jobs failures

2016-03-07 Thread Derek Higgins
On 7 March 2016 at 12:11, John Trowbridge  wrote:
>
>
> On 03/06/2016 11:58 AM, James Slagle wrote:
>> On Sat, Mar 5, 2016 at 11:15 AM, Emilien Macchi  wrote:
>>> I'm kind of hijacking Dan's e-mail but I would like to propose some
>>> technical improvements to stop having so much CI failures.
>>>
>>>
>>> 1/ Stop creating swap files. We don't have SSD, this is IMHO a terrible
>>> mistake to swap on files because we don't have enough RAM. In my
>>> experience, swaping on non-SSD disks is even worst that not having
>>> enough RAM. We should stop doing that I think.
>>
>> We have been relying on swap in tripleo-ci for a little while. While
>> not ideal, it has been an effective way to at least be able to test
>> what we've been testing given the amount of physical RAM that is
>> available.
>>
>> The recent change to add swap to the overcloud nodes has proved to be
>> unstable. But that has more to do with it being racey with the
>> validation deployment afaict. There are some patches currently up to
>> address those issues.
>>
>>>
>>>
>>> 2/ Split CI jobs in scenarios.
>>>
>>> Currently we have CI jobs for ceph, HA, non-ha, containers and the
>>> current situation is that jobs fail randomly, due to performances issues.
>>>
>>> Puppet OpenStack CI had the same issue where we had one integration job
>>> and we never stopped adding more services until all becomes *very*
>>> unstable. We solved that issue by splitting the jobs and creating scenarios:
>>>
>>> https://github.com/openstack/puppet-openstack-integration#description
>>>
>>> What I propose is to split TripleO jobs in more jobs, but with less
>>> services.
>>>
>>> The benefit of that:
>>>
>>> * more services coverage
>>> * jobs will run faster
>>> * less random issues due to bad performances
>>>
>>> The cost is of course it will consume more resources.
>>> That's why I suggest 3/.
>>>
>>> We could have:
>>>
>>> * HA job with ceph and a full compute scenario (glance, nova, cinder,
>>> ceilometer, aodh & gnocchi).
>>> * Same with IPv6 & SSL.
>>> * HA job without ceph and full compute scenario too
>>> * HA job without ceph and basic compute (glance and nova), with extra
>>> services like Trove, Sahara, etc.
>>> * ...
>>> (note: all jobs would have network isolation, which is to me a
>>> requirement when testing an installer like TripleO).
>>
>> Each of those jobs would at least require as much memory as our
>> current HA job. I don't see how this gets us to using less memory. The
>> HA job we have now already deploys the minimal amount of services that
>> is possible given our current architecture. Without the composable
>> service roles work, we can't deploy less services than we already are.
>>
>>
>>
>>>
>>> 3/ Drop non-ha job.
>>> I'm not sure why we have it, and the benefit of testing that comparing
>>> to HA.
>>
>> In my opinion, I actually think that we could drop the ceph and non-ha
>> job from the check-tripleo queue.
>>
>> non-ha doesn't test anything realistic, and it doesn't really provide
>> any faster feedback on patches. It seems at most it might run 15-20
>> minutes faster than the HA job on average. Sometimes it even runs
>> slower than the HA job.
>>
>> The ceph job we could move to the experimental queue to run on demand
>> on patches that might affect ceph, and it could also be a daily
>> periodic job.
>>
>> The same could be done for the containers job, an IPv6 job, and an
>> upgrades job. Ideally with a way to run an individual job as needed.
>> Would we need different experimental queues to do that?
>>
>> That would leave only the HA job in the check queue, which we should
>> run with SSL and network isolation. We could deploy less testenv's
>> since we'd have less jobs running, but give the ones we do deploy more
>> RAM. I think this would really alleviate a lot of the transient
>> intermittent failures we get in CI currently. It would also likely run
>> faster.
>>
>> It's probably worth seeking out some exact evidence from the RDO
>> centos-ci, because I think they are testing with virtual environments
>> that have a lot more RAM than tripleo-ci does. It'd be good to
>> understand if they have some of the transient failures that tripleo-ci
>> does as well.
>>
>
> The HA job in RDO CI is also more unstable than nonHA, although this is
> usually not to do with memory contention. Most of the time that I see
> the HA job fail spuriously in RDO CI, it is because of the Nova
> scheduler race. I would bet that this race is the cause for the
> fluctuating amount of time jobs take as well, because the recovery
> mechanism for this is just to retry. Those retries can add 15 min. per
> retry to the deploy. In RDO CI there is a 60min. timeout for deploy as
> well. If we can't deploy to virtual machines in under an hour, to me
> that is a bug. (Note, I am speaking of `openstack overcloud deploy` when
> I say deploy, though start to finish can take less than an hour with
> decent CPUs)
>
> RDO CI uses the 

Re: [openstack-dev] [tripleo] CI jobs failures

2016-03-07 Thread Derek Higgins
On 7 March 2016 at 15:24, Derek Higgins <der...@redhat.com> wrote:
> On 6 March 2016 at 16:58, James Slagle <james.sla...@gmail.com> wrote:
>> On Sat, Mar 5, 2016 at 11:15 AM, Emilien Macchi <emil...@redhat.com> wrote:
>>> I'm kind of hijacking Dan's e-mail but I would like to propose some
>>> technical improvements to stop having so much CI failures.
>>>
>>>
>>> 1/ Stop creating swap files. We don't have SSD, this is IMHO a terrible
>>> mistake to swap on files because we don't have enough RAM. In my
>>> experience, swaping on non-SSD disks is even worst that not having
>>> enough RAM. We should stop doing that I think.
>>
>> We have been relying on swap in tripleo-ci for a little while. While
>> not ideal, it has been an effective way to at least be able to test
>> what we've been testing given the amount of physical RAM that is
>> available.
>
> Ok, so I have a few points here, in places where I'm making
> assumptions I'll try to point it out
>
> o Yes I agree using swap should be avoided if at all possible
>
> o We are currently looking into adding more RAM to our testenv hosts,
> it which point we can afford to be a little more liberal with Memory
> and this problem should become less of an issue, having said that
>
> o Even though using swap is bad, if we have some processes with a
> large Mem footprint that don't require constant access to a portion of
> the footprint swaping it out over the duration of the CI test isn't as
> expensive as it would suggest (assuming it doesn't need to be swapped
> back in and the kernel has selected good candidates to swap out)
>
> o The test envs that host the undercloud and overcloud nodes have 64G
> of RAM each, they each host 4 testenvs and each test env if running a
> HA job can use up to 21G of RAM so we have over committed there, it
> this is only a problem if a test env host gets 4 HA jobs that are
> started around the same time (and as a result a each have 4 overcloud
> nodes running at the same time), to allow this to happen without VM's
> being killed by the OOM we've also enabled swap there. The majority of
> the time this swap isn't in use, only if all 4 testenvs are being
> simultaneously used and they are all running the second half of a CI
> test at the same time.
>
> o The overcloud nodes are VM's running with a "unsafe" disk caching
> mechanism, this causes sync requests from guest to be ignored and as a
> result if the instances being hosted on these nodes are going into
> swap this swap will be cached on the host as long as RAM is available.
> i.e. swap being used in the undercloud or overcloud isn't being synced
> to the disk on the host unless it has to be.
>
> o What I'd like us to avoid is simply bumping up the memory every time
> we hit a OOM error without at least
>   1. Explaining why we need more memory all of a sudden
>   2. Looking into a way we may be able to avoid simply bumping the RAM
> (at peak times we are memory constrained)
>
> as an example, Lets take a look at the swap usage on the undercloud of
> a recent ci nonha job[1][2], These insances have 5G of RAM with 2G or
> swap enabled via a swapfile
> the overcloud deploy started @22:07:46 and finished at @22:28:06
>
> In the graph you'll see a spike in memory being swapped out around
> 22:09, this corresponds almost exactly to when the overcloud image is
> being downloaded from swift[3], looking the top output at the end of
> the test you'll see that swift-proxy is using over 500M of Mem[4].
>
> I'd much prefer we spend time looking into why the swift proxy is
> using this much memory rather then blindly bump the memory allocated
> to the VM, perhaps we have something configured incorrectly or we've
> hit a bug in swift.
>
> Having said all that we can bump the memory allocated to each node but
> we have to accept 1 of 2 possible consequences
> 1. We'll env up using the swap on the testenv hosts more then we
> currently are or
> 2. We'll have to reduce the number of test envs per host from 4 down
> to 3, wiping 25% of our capacity

Thinking about this a little more, we could do a radical experiment
for a week and just do this, i.e. bump up the RAM on each env and
accept we loose 25 of our capacity, maybe it doesn't matter, if our
success rate goes up then we'd be running less rechecks anyways.
The downside is that we'd probably hit less timing errors (assuming
the tight resources is whats showing them up), I say downside because
this just means downstream users might hit them more often if CI
isn't. Anyways maybe worth discussing at tomorrows meeting.


>
> [1] - 
> http://logs.openstack.org/85/289085/2/check-tripleo/gate-tripleo-ci-f22-nonha/6fda33c/
&g

Re: [openstack-dev] [tripleo] CI jobs failures

2016-03-07 Thread Derek Higgins
On 6 March 2016 at 16:58, James Slagle  wrote:
> On Sat, Mar 5, 2016 at 11:15 AM, Emilien Macchi  wrote:
>> I'm kind of hijacking Dan's e-mail but I would like to propose some
>> technical improvements to stop having so much CI failures.
>>
>>
>> 1/ Stop creating swap files. We don't have SSD, this is IMHO a terrible
>> mistake to swap on files because we don't have enough RAM. In my
>> experience, swaping on non-SSD disks is even worst that not having
>> enough RAM. We should stop doing that I think.
>
> We have been relying on swap in tripleo-ci for a little while. While
> not ideal, it has been an effective way to at least be able to test
> what we've been testing given the amount of physical RAM that is
> available.

Ok, so I have a few points here, in places where I'm making
assumptions I'll try to point it out

o Yes I agree using swap should be avoided if at all possible

o We are currently looking into adding more RAM to our testenv hosts,
it which point we can afford to be a little more liberal with Memory
and this problem should become less of an issue, having said that

o Even though using swap is bad, if we have some processes with a
large Mem footprint that don't require constant access to a portion of
the footprint swaping it out over the duration of the CI test isn't as
expensive as it would suggest (assuming it doesn't need to be swapped
back in and the kernel has selected good candidates to swap out)

o The test envs that host the undercloud and overcloud nodes have 64G
of RAM each, they each host 4 testenvs and each test env if running a
HA job can use up to 21G of RAM so we have over committed there, it
this is only a problem if a test env host gets 4 HA jobs that are
started around the same time (and as a result a each have 4 overcloud
nodes running at the same time), to allow this to happen without VM's
being killed by the OOM we've also enabled swap there. The majority of
the time this swap isn't in use, only if all 4 testenvs are being
simultaneously used and they are all running the second half of a CI
test at the same time.

o The overcloud nodes are VM's running with a "unsafe" disk caching
mechanism, this causes sync requests from guest to be ignored and as a
result if the instances being hosted on these nodes are going into
swap this swap will be cached on the host as long as RAM is available.
i.e. swap being used in the undercloud or overcloud isn't being synced
to the disk on the host unless it has to be.

o What I'd like us to avoid is simply bumping up the memory every time
we hit a OOM error without at least
  1. Explaining why we need more memory all of a sudden
  2. Looking into a way we may be able to avoid simply bumping the RAM
(at peak times we are memory constrained)

as an example, Lets take a look at the swap usage on the undercloud of
a recent ci nonha job[1][2], These insances have 5G of RAM with 2G or
swap enabled via a swapfile
the overcloud deploy started @22:07:46 and finished at @22:28:06

In the graph you'll see a spike in memory being swapped out around
22:09, this corresponds almost exactly to when the overcloud image is
being downloaded from swift[3], looking the top output at the end of
the test you'll see that swift-proxy is using over 500M of Mem[4].

I'd much prefer we spend time looking into why the swift proxy is
using this much memory rather then blindly bump the memory allocated
to the VM, perhaps we have something configured incorrectly or we've
hit a bug in swift.

Having said all that we can bump the memory allocated to each node but
we have to accept 1 of 2 possible consequences
1. We'll env up using the swap on the testenv hosts more then we
currently are or
2. We'll have to reduce the number of test envs per host from 4 down
to 3, wiping 25% of our capacity

[1] - 
http://logs.openstack.org/85/289085/2/check-tripleo/gate-tripleo-ci-f22-nonha/6fda33c/
[2] - http://goodsquishy.com/downloads/20160307/swap.png
[3] - 22:09:03 21678 INFO [-] Master cache miss for image
b6a96213-7955-4c4d-829e-871350939e03, starting download
  22:09:41 21678 DEBUG [-] Running cmd (subprocess): qemu-img info
/var/lib/ironic/master_images/tmpvjAlCU/b6a96213-7955-4c4d-829e-871350939e03.part
[4] - 17690 swift 20   0  804824 547724   1780 S   0.0 10.8
0:04.82 swift-prox+


>
> The recent change to add swap to the overcloud nodes has proved to be
> unstable. But that has more to do with it being racey with the
> validation deployment afaict. There are some patches currently up to
> address those issues.
>
>>
>>
>> 2/ Split CI jobs in scenarios.
>>
>> Currently we have CI jobs for ceph, HA, non-ha, containers and the
>> current situation is that jobs fail randomly, due to performances issues.

We don't know it due to performance issues, Your probably correct that
we wouldn't see them if we were allocating more resources to the ci
tests but this just means we have timing issues that are more
prevalent when resource 

Re: [openstack-dev] [TripleO] Stable branch policy for Mitaka

2016-02-10 Thread Derek Higgins



On 10/02/16 18:05, James Slagle wrote:



On Wed, Feb 10, 2016 at 4:57 PM, Steven Hardy > wrote:

Hi all,

We discussed this in our meeting[1] this week, and agreed a ML
discussion
to gain consensus and give folks visibility of the outcome would be
a good
idea.

In summary, we adopted a more permissive "release branch" policy[2]
for our
stable/liberty branches, where feature backports would be allowed,
provided
they worked with liberty and didn't break backwards compatibility.

The original idea was really to provide a mechanism to "catch up" where
features are added e.g to liberty OpenStack components late in the cycle
and TripleO requires changes to integrate with them.

However, the reality has been that the permissive backport policy
has been
somewhat abused (IMHO) with a large number of major features being
proposed
for backport, and in a few cases this has broken downstream (RDO)
consumers
of TripleO.

Thus, I would propose that from Mitaka, we revise our backport policy to
simply align with the standard stable branch model observed by all
projects[3].

Hopefully this will allow us to retain the benefits of the stable branch
process, but provide better stability for downstream consumers of these
branches, and minimise confusion regarding what is a permissable
backport.

If we do this, only backports that can reasonably be considered
"Appropriate fixes"[4] will be valid backports - in the majority of
cases
this will mean bugfixes only, and large features where the risk of
regression is significant will not be allowed.

What are peoples thoughts on this?


​I'm in agreement. I think this change is needed and will help set
better expectations around what will be included in which release.

If we adopt this as the new policy, then the immediate followup is to
set and communicate when we'll be cutting the stable branches, so that
it's understood when the features have to be done/committed. I'd suggest
that we more or less completely adopt the integrated release
schedule[1]. Which I believe means the week of RC1 for cutting the
stable/mitaka branches, which is March 14th-18th.

It seems to follow logically then that we'd then want to also be more
aggresively aligned with other integrated release events such as the
feature freeze date, Feb 29th - March 4th.

An alternative to strictly following the schedule, would be to say that
TripleO lags the integrated release dates by some number of weeks (1 or
2 I'd think), to allow for some "catchup" time since TripleO is often
consuming features from projects part of the integrated release.


This is where my vote would lie, given that we are consumers of the 
other projects we may need a little time to support a feature that is 
merged late in the cycle. Of course we can also have patches lined up 
ready to merge so the lag shouldn't need to be excessive.


If we don't lag we could achieve the same thing by allowing a short 
window in the stable branch where features may be allowed based on group 
opinion.





[1] http://releases.openstack.org/mitaka/schedule.html​


Thanks,

Steve

[1]

http://eavesdrop.openstack.org/meetings/tripleo/2016/tripleo.2016-02-09-14.01.log.html
[2]

https://github.com/openstack/tripleo-specs/blob/master/specs/liberty/release-branch.rst
[3] http://docs.openstack.org/project-team-guide/stable-branches.html
[4]

http://docs.openstack.org/project-team-guide/stable-branches.html#appropriate-fixes

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
-- James Slagle
--


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] spec-lite process for tripleo

2016-01-27 Thread Derek Higgins

Hi All,

We briefly discussed feature tracking in this weeks tripleo meeting. I 
would like to provide a way for downstream consumers (and ourselves) to 
track new features as they get implemented. The main things that came 
out of the discussion is that people liked the spec-lite process that 
the glance team are using.


I'm proposing we would start to use the same process, essentially small 
features that don't warrant a blueprint would instead have a wishlist 
bug opened against them and get marked with the spec-lite tag. This bug 
could then be referenced in the commit messages. For larger features 
blueprints can still be used. I think the process documented by 
glance[1] is a good model to follow so go read that and see what you think


The general feeling at the meeting was +1 to doing this[2] so I hope we 
can soon start enforcing it, assuming people are still happy to proceed?


thanks,
Derek.

[1] 
http://docs.openstack.org/developer/glance/contributing/blueprints.html#glance-spec-lite
[2] 
http://eavesdrop.openstack.org/meetings/tripleo/2016/tripleo.2016-01-26-14.02.log.html


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] changes in irc alerts

2016-01-26 Thread Derek Higgins

Hi All,

For the last few months we've been alerting the #tripleo irc channel 
when a card is open on the tripleo trello org, in the urgent list.


When used I think it served a good purpose to alert people to the fact 
that deploying master is currently broken, but it hasn't been used as 
much as I hoped(not to mention the duplication of sometimes needing a LB 
bug anyways). As most people are more accustomed to creating LP bugs 
when things are broken and to avoid duplication perhaps it would have 
been better to use LaunchPad to drive the alerts instead.


I've changed the bot that was looking at trello to now instead look for 
bugs on launchpad (hourly), it will alert the #tripleo channel if it 
finds a bug that matches


is filed against the tripleo project  AND
has a Importance or "Critical"AND
has the tag "alert" applied to it

I brought this up in todays meeting and people were +1 on the idea, do 
the rules above work for people? if not I can change them to something 
more suitable.


thanks,
Derek.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

2015-12-02 Thread Derek Higgins



On 02/12/15 12:53, Steven Hardy wrote:

On Tue, Dec 01, 2015 at 05:10:57PM -0800, Devananda van der Veen wrote:

On Tue, Dec 1, 2015 at 3:22 AM, Steven Hardy <sha...@redhat.com> wrote:

  On Mon, Nov 30, 2015 at 03:35:13PM -0800, Devananda van der Veen wrote:
  >Â  Â  On Mon, Nov 30, 2015 at 3:07 PM, Zane Bitter <zbit...@redhat.com>
  wrote:
  >
  >Â  Â  Â  On 30/11/15 12:51, Ruby Loo wrote:
  >
  >Â  Â  Â  Â  On 30 November 2015 at 10:19, Derek Higgins
  <der...@redhat.com
  >Â  Â  Â  Â  <mailto:der...@redhat.com>> wrote:
  >
  >Â  Â  Â  Â  Ã*Â  Ã*Â  Hi All,
  >
  >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â A few months tripleo switch from
  its devtest based CI to
  >Â  Â  Â  Â  one
  >Â  Â  Â  Â  Ã*Â  Ã*Â  that was based on instack. Before doing this we
  anticipated
  >Â  Â  Â  Â  Ã*Â  Ã*Â  disruption in the ci jobs and removed them from
  non tripleo
  >Â  Â  Â  Â  projects.
  >
  >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â We'd like to investigate adding it
  back to heat and
  >Â  Â  Â  Â  ironic as
  >Â  Â  Â  Â  Ã*Â  Ã*Â  these are the two projects where we find our ci
  provides the
  >Â  Â  Â  Â  most
  >Â  Â  Â  Â  Ã*Â  Ã*Â  value. But we can only do this if the results
  from the job are
  >Â  Â  Â  Â  Ã*Â  Ã*Â  treated as voting.
  >
  >Â  Â  Â  Â  What does this mean? That the tripleo job could vote and do
  a -1 and
  >Â  Â  Â  Â  block ironic's gate?
  >
  >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â In the past most of the non tripleo
  projects tended to
  >Â  Â  Â  Â  ignore
  >Â  Â  Â  Â  Ã*Â  Ã*Â  the results from the tripleo job as it wasn't
  unusual for the
  >Â  Â  Â  Â  job to
  >Â  Â  Â  Â  Ã*Â  Ã*Â  broken for days at a time. The thing is, ignoring
  the results of
  >Â  Â  Â  Â  the
  >Â  Â  Â  Â  Ã*Â  Ã*Â  job is the reason (the majority of the time) it
  was broken in
  >Â  Â  Â  Â  the
  >Â  Â  Â  Â  Ã*Â  Ã*Â  first place.
  >Â  Â  Â  Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â  Ã*Â To decrease the number of breakages
  we are now no longer
  >Â  Â  Â  Â  Ã*Â  Ã*Â  running master code for everything (for the non
  tripleo projects
  >Â  Â  Â  Â  we
  >Â  Â  Â  Â  Ã*Â  Ã*Â  bump the versions we use periodically if they are
  working). I
  >Â  Â  Â  Â  Ã*Â  Ã*Â  believe with this model the CI jobs we run have
  become a lot
  >Â  Â  Â  Â  more
  >Â  Â  Â  Â  Ã*Â  Ã*Â  reliable, there are still breakages but far less
  frequently.
  >
  >Â  Â  Â  Â  Ã*Â  Ã*Â  What I proposing is we add at least one of our
  tripleo jobs back
  >Â  Â  Â  Â  to
  >Â  Â  Â  Â  Ã*Â  Ã*Â  both heat and ironic (and other projects
  associated with them
  >Â  Â  Â  Â  e.g.
  >Â  Â  Â  Â  Ã*Â  Ã*Â  clients, ironicinspector etc..), tripleo will
  switch to running
  >Â  Â  Â  Â  Ã*Â  Ã*Â  latest master of those repositories and the cores
  approving on
  >Â  Â  Â  Â  those
  >Â  Â  Â  Â  Ã*Â  Ã*Â  projects should wait for a passing CI jobs before
  hitting
  >Â  Â  Â  Â  approve.
  >Â  Â  Â  Â  Ã*Â  Ã*Â  So how do people feel about doing this? can we
  give it a go? A
  >Â  Â  Â  Â  Ã*Â  Ã*Â  couple of people have already expressed an
  interest in doing
  >Â  Â  Â  Â  this
  >Â  Â  Â  Â  Ã*Â  Ã*Â  but I'd like to make sure were all in agreement
  before switching
  >Â  Â  Â  Â  it on.
  >
  >Â  Â  Â  Â  This seems to indicate that the tripleo jobs are
  non-voting, or at
  >Â  Â  Â  Â  least
  >Â  Â  Â  Â  won't block the gate -- so I'm fine with adding tripleo
  jobs to
  >Â  Â  Â  Â  ironic.
  >Â  Â  Â  Â  But if you want cores to wait/make sure they pass, then
  shouldn't they
  >Â  Â  Â  Â  be voting? (Guess I'm a bit confused.)
  >
  >Â  Â  Â  +1
  >
  >Â  Â  Â  I don't think it hurts to turn it on, but tbh I'm
  uncomfortable with the
  >Â  Â  Â  mental overhead of a non-voting job that I have to manually
  treat as a
  >Â  Â  Â  voting job. If it's stable enough to make it a voting job, I'd
  prefer we
  >Â  Â  Â  just make it voting. And if it's not then I'd like to see it
  be made
  >Â  Â  Â  stable enough to be a voting job and then make it voting.
  >
  >Â  Â  This is roughly where I sit as well -- if it's non-voting,
  experience
  >Â  Â  tells me that it will largely be ignored, and as such, isn't a
  good use of
  >Â  Â  resources.

  I'm sure you can appreciate it's something of a chicken/egg problem
  though
  - if everyone a

Re: [openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

2015-12-01 Thread Derek Higgins



On 30/11/15 22:18, Steven Hardy wrote:

On Mon, Nov 30, 2015 at 12:51:53PM -0500, Ruby Loo wrote:

On 30 November 2015 at 10:19, Derek Higgins <der...@redhat.com> wrote:

  Hi All,

  Â  Â  A few months tripleo switch from its devtest based CI to one that
  was based on instack. Before doing this we anticipated disruption in the
  ci jobs and removed them from non tripleo projects.

  Â  Â  We'd like to investigate adding it back to heat and ironic as
  these are the two projects where we find our ci provides the most value.
  But we can only do this if the results from the job are treated as
  voting.

What does this mean? That the tripleo job could vote and do a -1 and block
ironic's gate?


I believe it means they would be non voting, but cores should be careful
not to ignore them, e.g if a patch isn't passing tripleo CI it should be
investigated before merging said patch.


Exactly, this is pretty much the situation in tripleo and has worked 
quit well.





  Â  Â  In the past most of the non tripleo projects tended to ignore the
  results from the tripleo job as it wasn't unusual for the job to broken
  for days at a time. The thing is, ignoring the results of the job is the
  reason (the majority of the time) it was broken in the first place.
  Â  Â  To decrease the number of breakages we are now no longer running
  master code for everything (for the non tripleo projects we bump the
  versions we use periodically if they are working). I believe with this
  model the CI jobs we run have become a lot more reliable, there are
  still breakages but far less frequently.

  What I proposing is we add at least one of our tripleo jobs back to both
  heat and ironic (and other projects associated with them e.g. clients,
  ironicinspector etc..), tripleo will switch to running latest master of
  those repositories and the cores approving on those projects should wait
  for a passing CI jobs before hitting approve. So how do people feel
  about doing this? can we give it a go? A couple of people have already
  expressed an interest in doing this but I'd like to make sure were all
  in agreement before switching it on.

This seems to indicate that the tripleo jobs are non-voting, or at least
won't block the gate -- so I'm fine with adding tripleo jobs to ironic.
But if you want cores to wait/make sure they pass, then shouldn't they be
voting? (Guess I'm a bit confused.)


The subtext here is that automated testing of OpenStack deployments is
hard, and TripleO CI sometimes experiences breakage for various reasons
including regressions in any one of the OpenStack projects it uses.

For example, TripleO CI has been broken for the last day or two due to a
nodepool regression - in this scenario it's probably best for Ironic and
Heat cores to maintain the ability to land patches, even if we may decide
it's unwise to land larger and/or more risky changes until they can be
validated against TripleO CI.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

2015-11-30 Thread Derek Higgins



On 30/11/15 17:03, Dmitry Tantsur wrote:

On 11/30/2015 04:19 PM, Derek Higgins wrote:

Hi All,

 A few months tripleo switch from its devtest based CI to one that
was based on instack. Before doing this we anticipated disruption in the
ci jobs and removed them from non tripleo projects.

 We'd like to investigate adding it back to heat and ironic as these
are the two projects where we find our ci provides the most value. But
we can only do this if the results from the job are treated as voting.

 In the past most of the non tripleo projects tended to ignore the
results from the tripleo job as it wasn't unusual for the job to broken
for days at a time. The thing is, ignoring the results of the job is the
reason (the majority of the time) it was broken in the first place.
 To decrease the number of breakages we are now no longer running
master code for everything (for the non tripleo projects we bump the
versions we use periodically if they are working). I believe with this
model the CI jobs we run have become a lot more reliable, there are
still breakages but far less frequently.

What I proposing is we add at least one of our tripleo jobs back to both
heat and ironic (and other projects associated with them e.g. clients,
ironicinspector etc..), tripleo will switch to running latest master of
those repositories and the cores approving on those projects should wait
for a passing CI jobs before hitting approve. So how do people feel
about doing this? can we give it a go? A couple of people have already
expressed an interest in doing this but I'd like to make sure were all
in agreement before switching it on.


I'm one of these "people", so definitely +1 here.

By the way, is it possible to NOT run tripleo-ci on changes touching
only tests and docs? We do the same for our devstack jobs, it saves some
infra resources.
We don't do it currently, but I'm sure we could and it sounds like a 
good idea to me.






thanks,
Derek.

__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ironic][heat] Adding back the tripleo check job

2015-11-30 Thread Derek Higgins

Hi All,

A few months tripleo switch from its devtest based CI to one that 
was based on instack. Before doing this we anticipated disruption in the 
ci jobs and removed them from non tripleo projects.


We'd like to investigate adding it back to heat and ironic as these 
are the two projects where we find our ci provides the most value. But 
we can only do this if the results from the job are treated as voting.


In the past most of the non tripleo projects tended to ignore the 
results from the tripleo job as it wasn't unusual for the job to broken 
for days at a time. The thing is, ignoring the results of the job is the 
reason (the majority of the time) it was broken in the first place.
To decrease the number of breakages we are now no longer running 
master code for everything (for the non tripleo projects we bump the 
versions we use periodically if they are working). I believe with this 
model the CI jobs we run have become a lot more reliable, there are 
still breakages but far less frequently.


What I proposing is we add at least one of our tripleo jobs back to both 
heat and ironic (and other projects associated with them e.g. clients, 
ironicinspector etc..), tripleo will switch to running latest master of 
those repositories and the cores approving on those projects should wait 
for a passing CI jobs before hitting approve. So how do people feel 
about doing this? can we give it a go? A couple of people have already 
expressed an interest in doing this but I'd like to make sure were all 
in agreement before switching it on.


thanks,
Derek.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [OpenStack-Infra] nodepool node getting floating IP's on tripleo cloud

2015-11-30 Thread Derek Higgins



On 30/11/15 14:56, Derek Higgins wrote:

Hi all,

 I've been asking about this on irc but given the holidays last week
the problem is still persisting, now since freenode  unavailable maybe
an email might be better.

tripleo ci jobs havn't run sense Thursday morning about (0700 UTC), it
looks to me like nodepool is constantly spinning up VMs and deleting
them again with no attempt to allocated the VM a IP address. yolanda
tells me nodepool is trying to ssh to a 10.2.x.x address (which isn't
the correct address)

 From the looks of it, this patch
https://review.openstack.org/#/c/249351/1

has caused nodepool to start treating the internal IPs as external and
it no longer attempting to allocate an floating IP.

How can we best go about getting things running again? One possibility
is the patch below I think this will make nodepool avoid the new
codepath for the tripleo cloud and at least work around the problem
https://review.openstack.org/251404


I've also proposed a revert of the nodepool commit as it may not be only 
the tripleo cloud that is effected.

https://review.openstack.org/#/c/251438/



thanks,
Derek.

___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


___
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra


Re: [openstack-dev] [TripleO] Proposing Ian Wienand as core reviewer on diskimage-builder

2015-11-03 Thread Derek Higgins



On 03/11/15 15:25, Gregory Haynes wrote:

Hello everyone,

I would like to propose adding Ian Wienand as a core reviewer on the
diskimage-builder project. Ian has been making a significant number of
contributions for some time to the project, and has been a great help in
reviews lately. Thus, I think we could benefit greatly by adding him as
a core reviewer.

Current cores - Please respond with any approvals/objections by next Friday
(November 13th).


+1 from me, Ian has been putting in a lot of good reviews in DIB over 
the last few months.




Cheers,
Greg

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] tripleo.org theme

2015-09-25 Thread Derek Higgins



On 25/09/15 13:34, Dan Prince wrote:

It has come to my attention that we aren't making great use of our
tripleo.org domain. One thing that would be useful would be to have the
new tripleo-docs content displayed there. It would also be nice to have
quick links to some of our useful resources, perhaps Derek's CI report
[1], a custom Reviewday page for TripleO reviews (something like this
[2]), and perhaps other links too. I'm thinking these go in the header,
and not just on some random TripleO docs page. Or perhaps both places.


We could even host some of these things on tripleo.org (not just link to 
them)




I was thinking that instead of the normal OpenStack theme however we
could go a bit off the beaten path and do our own TripleO theme.
Basically a custom tripleosphinx project that we ninja in as a
replacement for oslosphinx.

Could get our own mascot... or do something silly with words. I'm
reaching out to graphics artists who could help with this sort of
thing... but before that decision is made I wanted to ask about
thoughts on the matter here first.


+1 from me, the more content, articles etc... we can get up ther the 
better as long as we keep at it and it doesn't go stale.




Speak up... it would be nice to have this wrapped up before Tokyo.

[1] http://goodsquishy.com/downloads/tripleo-jobs.html
[2] http://status.openstack.org/reviews/

Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Current meeting timeslot

2015-09-15 Thread Derek Higgins

On 10/09/15 15:12, Derek Higgins wrote:

Hi All,

The current meeting slot for TripleO is every second Tuesday @ 1900 UTC,
since that time slot was chosen a lot of people have joined the team and
others have moved on, I like to revisit the timeslot to see if we can
accommodate more people at the meeting (myself included).

Sticking with Tuesday I see two other slots available that I think will
accommodate more people currently working on TripleO,

Here is the etherpad[1], can you please add your name under the time
slots that would suit you so we can get a good idea how a change would
effect people


Looks like moving the meeting to 1400 UTC will best accommodate 
everybody, I've proposed a patch to change our slot


https://review.openstack.org/#/c/223538/

In case the etherpad disappears here was the results

Current Slot ( 1900 UTC, Tuesdays,  biweekly)
o Suits me fine - 2 votes
o May make it sometimes - 6 votes

Proposal 1 ( 1600 UTC, Tuesdays,  biweekly)
o Suits me fine - 7 votes
o May make it sometimes - 2 votes

Proposal 2 ( 1400 UTC, Tuesdays,  biweekly)
o Suits me fine - 9 votes
o May make it sometimes - 0 votes

I can't make any of these - 0 votes

thanks,
Derek.




thanks,
Derek.


[1] - https://etherpad.openstack.org/p/SocOjvLr6o

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Current meeting timeslot

2015-09-15 Thread Derek Higgins



On 15/09/15 12:38, Derek Higgins wrote:

On 10/09/15 15:12, Derek Higgins wrote:

Hi All,

The current meeting slot for TripleO is every second Tuesday @ 1900 UTC,
since that time slot was chosen a lot of people have joined the team and
others have moved on, I like to revisit the timeslot to see if we can
accommodate more people at the meeting (myself included).

Sticking with Tuesday I see two other slots available that I think will
accommodate more people currently working on TripleO,

Here is the etherpad[1], can you please add your name under the time
slots that would suit you so we can get a good idea how a change would
effect people


Looks like moving the meeting to 1400 UTC will best accommodate
everybody, I've proposed a patch to change our slot

https://review.openstack.org/#/c/223538/


This has merged so as of next tuesdat, the tripleo meeting will be at 
1400UTC


Hope to see ye there



In case the etherpad disappears here was the results

Current Slot ( 1900 UTC, Tuesdays,  biweekly)
o Suits me fine - 2 votes
o May make it sometimes - 6 votes

Proposal 1 ( 1600 UTC, Tuesdays,  biweekly)
o Suits me fine - 7 votes
o May make it sometimes - 2 votes

Proposal 2 ( 1400 UTC, Tuesdays,  biweekly)
o Suits me fine - 9 votes
o May make it sometimes - 0 votes

I can't make any of these - 0 votes

thanks,
Derek.




thanks,
Derek.


[1] - https://etherpad.openstack.org/p/SocOjvLr6o

__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] trello

2015-09-10 Thread Derek Higgins



On 09/09/15 18:03, Jason Rist wrote:

On 09/09/2015 07:09 AM, Derek Higgins wrote:



On 08/09/15 16:36, Derek Higgins wrote:

Hi All,

 Some of ye may remember some time ago we used to organize TripleO
based jobs/tasks on a trello board[1], at some stage this board fell out
of use (the exact reason I can't put my finger on). This morning I was
putting a list of things together that need to be done in the area of CI
and needed somewhere to keep track of it.

I propose we get back to using this trello board and each of us add
cards at the very least for the things we are working on.

This should give each of us a lot more visibility into what is ongoing
on in the tripleo project currently, unless I hear any objections,
tomorrow I'll start archiving all cards on the boards and removing
people no longer involved in tripleo. We can then start adding items and
anybody who wants in can be added again.


This is now done, see
https://trello.com/tripleo

Please ping me on irc if you want to be added.



thanks,
Derek.

[1] - https://trello.com/tripleo

__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Derek - you weren't on today when I went to ping you, can you please add me so 
I can track it for RHCI purposes?


Done



Thanks!



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Core reviewers for python-tripleoclient and tripleo-common

2015-09-10 Thread Derek Higgins



On 10/09/15 15:06, James Slagle wrote:

TripleO has added a few new repositories, one of which is
python-tripleoclient[1], the former python-rdomanager-oscplugin.

With the additional repositories, there is an additional review burden
on our core reviewers. There is also the fact that folks who have been
working on the client code for a while when it was only part of RDO
are not TripleO core reviewers.

I think we could help with the additional burden of reviews if we made
two of those people core on python-tripleoclient and tripleo-common
now.

Specifically, the folks I'm proposing are:
Brad P. Crochet 
Dougal Matthews 

The options I see are:
- keep just 1 tripleo acl, and add additional folks there, with a good
faith agreement not to +/-2,+A code that is not from the 2 client
repos.


+1 to doing this, but I would reword the good faith aggreement to "not 
to +/-2,+A code that they are not comfortable/familiar with", in other 
words the same agreement I would expect from any other core. In the same 
way I'll not be adding +2 on tripleoclient code until(if) I know with 
reasonable confidence I'm not doing something stupid.




- create a new gerrit acl in project-config for just these 2 client
repos, and add folks there as needed. the new acl would also contain
the existing acl for tripleo core reviewers
- neither of the above options - don't add these individuals to any
TripleO core team at this time.

The first is what was more or less done when Tuskar was brought under
the TripleO umbrella to avoid splitting the core teams, and it's the
option I'd prefer.

TripleO cores, please reply here with your vote from the above
options. Or, if you have other ideas, you can share those as well :)

[1] https://review.openstack.org/#/c/215186/



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] trello

2015-09-09 Thread Derek Higgins



On 08/09/15 16:36, Derek Higgins wrote:

Hi All,

Some of ye may remember some time ago we used to organize TripleO
based jobs/tasks on a trello board[1], at some stage this board fell out
of use (the exact reason I can't put my finger on). This morning I was
putting a list of things together that need to be done in the area of CI
and needed somewhere to keep track of it.

I propose we get back to using this trello board and each of us add
cards at the very least for the things we are working on.

This should give each of us a lot more visibility into what is ongoing
on in the tripleo project currently, unless I hear any objections,
tomorrow I'll start archiving all cards on the boards and removing
people no longer involved in tripleo. We can then start adding items and
anybody who wants in can be added again.


This is now done, see
https://trello.com/tripleo

Please ping me on irc if you want to be added.



thanks,
Derek.

[1] - https://trello.com/tripleo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] trello

2015-09-08 Thread Derek Higgins

Hi All,

   Some of ye may remember some time ago we used to organize TripleO 
based jobs/tasks on a trello board[1], at some stage this board fell out 
of use (the exact reason I can't put my finger on). This morning I was 
putting a list of things together that need to be done in the area of CI 
and needed somewhere to keep track of it.


I propose we get back to using this trello board and each of us add 
cards at the very least for the things we are working on.


This should give each of us a lot more visibility into what is ongoing 
on in the tripleo project currently, unless I hear any objections, 
tomorrow I'll start archiving all cards on the boards and removing 
people no longer involved in tripleo. We can then start adding items and 
anybody who wants in can be added again.


thanks,
Derek.

[1] - https://trello.com/tripleo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Status of CI changes

2015-09-08 Thread Derek Higgins



On 03/09/15 07:34, Derek Higgins wrote:

Hi All,

The patch to reshuffle our CI jobs has merged[1], along with the patch
to switch the f21-noha job to be instack based[2] (with centos images).

So the current status is that our CI has been removed from most of the
non tripleo projects (with the exception of nova/neutron/heat and ironic
where it is only available with check experimental until we are sure its
reliable).

The last big move is to pull in some repositories into the upstream[3]
gerrit so until this happens we still have to worry about some projects
being on gerrithub (the instack based CI pulls them in from gerrithub
for now). I'll follow up with a mail once this happens


This has happened, as of now we should be developing the following 
repositories on https://review.openstack.org/#/


http://git.openstack.org/cgit/openstack/instack/
http://git.openstack.org/cgit/openstack/instack-undercloud/
http://git.openstack.org/cgit/openstack/tripleo-docs/
http://git.openstack.org/cgit/openstack/python-tripleoclient/



A lot of CI stuff still needs to be worked on (and improved) e.g.
  o Add ceph support to the instack based job
  o Add ha support to the instack based job
  o Improve the logs exposed
  o Pull out a lot of workarounds that have gone into the CI job
  o move out some of the parts we still use in tripleo-incubator
  o other stuff

Please make yourself known if your interested in any of the above

thanks,
Derek.

[1] https://review.openstack.org/#/c/205479/
[2] https://review.openstack.org/#/c/185151/
[3] https://review.openstack.org/#/c/215186/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Status of CI changes

2015-09-03 Thread Derek Higgins

Hi All,

The patch to reshuffle our CI jobs has merged[1], along with the patch 
to switch the f21-noha job to be instack based[2] (with centos images).


So the current status is that our CI has been removed from most of the 
non tripleo projects (with the exception of nova/neutron/heat and ironic

where it is only available with check experimental until we are sure its
reliable).

The last big move is to pull in some repositories into the upstream[3] 
gerrit so until this happens we still have to worry about some projects 
being on gerrithub (the instack based CI pulls them in from gerrithub 
for now). I'll follow up with a mail once this happens


A lot of CI stuff still needs to be worked on (and improved) e.g.
 o Add ceph support to the instack based job
 o Add ha support to the instack based job
 o Improve the logs exposed
 o Pull out a lot of workarounds that have gone into the CI job
 o move out some of the parts we still use in tripleo-incubator
 o other stuff

Please make yourself known if your interested in any of the above

thanks,
Derek.

[1] https://review.openstack.org/#/c/205479/
[2] https://review.openstack.org/#/c/185151/
[3] https://review.openstack.org/#/c/215186/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Moving instack upstream

2015-08-24 Thread Derek Higgins



On 24/08/15 09:14, Dougal Matthews wrote:





- Original Message -

From: Ben Nemec openst...@nemebean.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Sent: Friday, 21 August, 2015 11:00:31 PM
Subject: Re: [openstack-dev] [TripleO] Moving instack upstream

On 08/19/2015 12:22 PM, Dougal Matthews wrote:





- Original Message -

From: Dmitry Tantsur dtant...@redhat.com
To: openstack-dev@lists.openstack.org
Sent: Wednesday, 19 August, 2015 5:57:36 PM
Subject: Re: [openstack-dev] [TripleO] Moving instack upstream

On 08/19/2015 06:42 PM, Derek Higgins wrote:

On 06/08/15 15:01, Dougal Matthews wrote:

- Original Message -

From: Dan Prince dpri...@redhat.com
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Sent: Thursday, 6 August, 2015 1:12:42 PM
Subject: Re: [openstack-dev] [TripleO] Moving instack upstream


snip


I would really like to see us rename python-rdomanager-oscplugin. I
think any project having the name RDO in it probably doesn't belong
under TripleO proper. Looking at the project there are some distro
specific things... but those are fairly encapsulated (or could be made
so fairly easily).


I agree, it made sense as it was the entrypoint to RDO-Manager. However,
it could easily be called the python-tripleo-oscplugin or similar. The
changes would be really trivial, I can only think of one area that
may be distro specific.


I'm putting the commit together now to pull these repositories into
upstream tripleo are we happy with the name python-tripleo-oscplugin ?


Do we really need this oscplugin postfix? It may be clear for some of
us, but I don't that our users know that OSC means OpenStackClient, and
that oscplugin designates something that adds features to openstack
command. Can't we just call it python-tripleo? or maybe even just
tripleo?


+1 to either.

Having oscplugin in the name just revealed an implementation detail, there
may be a point where for some reason everyone moves away from OSC.


FWIW, I would prefer tripleo-client.  That's more in line with what the
other projects do, and doesn't carry the potential for confusion that
just naming it tripleo would.


How about tripleo-cli? I objected to client because it suggested there was a
server (which may happen, but hasn't yet).


The patch went up last week[1], can we move this conversation to the 
review so it doesn't get lost,


[1] - https://review.openstack.org/#/c/215186/3/gerrit/projects.yaml,cm














__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Moving instack upstream

2015-08-20 Thread Derek Higgins

Sorry for the delay in following up on this but I've been on my hols

The instack ci jobs should now work, it appears to be getting hit by a 
LOT of network glitches in particular downloading from the centos 
infrastructure but I think to avoid more delay we should merge the 
things and work on improving reliability, We have 3 things to merge in 
order to start the move


1. Switch the f21 nonha job to use instack (eventually we'll remove the 
old codepath)

https://review.openstack.org/#/c/185151/

2. Remove most of our ci jobs (we can build back up again afterwards)
https://review.openstack.org/#/c/205479/

3. pull in 3 repositories upstream
https://review.openstack.org/#/c/215186/

There will be follow up patches after these but this will get the ball 
rolling


thanks,
Derek.

On 23/07/15 07:40, Derek Higgins wrote:

See below

On 21/07/15 20:29, Derek Higgins wrote:

Hi All,
Something we discussed at the summit was to switch the focus of
tripleo's deployment method to deploy using instack using images built
with tripleo-puppet-elements. Up to now all the instack work has been
done downstream of tripleo as part of rdo. Having parts of our
deployment story outside of upstream gives us problems mainly because it
becomes very difficult to CI what we expect deployers to use while we're
developing the upstream parts.

Essentially what I'm talking about here is pulling instack-undercloud
upstream along with a few of its dependency projects (instack,
tripleo-common, tuskar-ui-extras etc..) into tripleo and using them in
our CI in place of devtest.

Getting our CI working with instack is close to working but has taken
longer then I expected because of various complications and distractions
but I hope to have something over the next few days that we can use to
replace devtest in CI, in a lot of ways this will start out by taking a
step backwards but we should finish up in a better place where we will
be developing (and running CI on) what we expect deployers to use.

Once I have something that works I think it makes sense to drop the jobs
undercloud-precise-nonha and overcloud-precise-nonha, while switching
overcloud-f21-nonha to use instack, this has a few effects that need to
be called out

1. We will no longer be running CI on (and as a result not supporting)
most of the the bash based elements
2. We will no longer be running CI on (and as a result not supporting)
ubuntu

Should anybody come along in the future interested in either of these
things (and prepared to put the time in) we can pick them back up again.
In fact the move to puppet element based images should mean we can more
easily add in extra distros in the future.

3. While we find our feet we should remove all tripleo-ci jobs from non
tripleo projects, once we're confident with it we can explore adding our
jobs back into other projects again

Nothing has changed yet, I order to check we're all on the same page
this is high level details of how I see things should proceed so shout
now if I got anything wrong or you disagree.


Ok, I have a POC that has worked end to end in our CI environment[1],
there are a *LOT* of workarounds in there so before we can merge it I
need to clean up and remove some of those workarounds and todo that a
few things need to move around, below is a list of what has to happen
(as best I can tell)

1) Pull in tripleo-heat-template spec changes to master delorean
We had two patches in the tripleo-heat-template midstream packaging that
havn't made it into the master packaging, these are
https://review.gerrithub.io/241056 Package firstboot and extraconfig
templates
https://review.gerrithub.io/241057 Package environments and newtork
directories

2) Fixes for instack-undercloud (I didn't push these directly incase it
effected people on old versions of puppet modules)
https://github.com/rdo-management/instack-undercloud/pull/5

3) Add packaging for various repositories into openstack-packaging
I've pulled the packaging for 5 repositories into
https://github.com/openstack-packages
https://github.com/openstack-packages/python-ironic-inspector-client
https://github.com/openstack-packages/python-rdomanager-oscplugin
https://github.com/openstack-packages/tuskar-ui-extras
https://github.com/openstack-packages/ironic-discoverd
https://github.com/openstack-packages/tripleo-common

I havn't imported these into gerrithub (incase following discussion we
need to delete them again) but assuming we're in agreement we should
pull them into gerrithub)

4) update rdoinfo
https://github.com/redhat-openstack/rdoinfo/pull/69
If everybody is happy with all above we should merge this, all of the
packages needed will now be on the delorean master repository

5) Move DELOREAN_REPO_URL in tripleo-ci to a new delorean repo that
includes all of the new packages

6) Take most of the workarounds out of this patch[1] and merge it

7) Reorg the tripleo ci tests (essentially remove all of the bash
element based tests).

8) Pull instack, instack

Re: [openstack-dev] [TripleO] Moving instack upstream

2015-08-19 Thread Derek Higgins

On 06/08/15 15:01, Dougal Matthews wrote:

- Original Message -

From: Dan Prince dpri...@redhat.com
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org
Sent: Thursday, 6 August, 2015 1:12:42 PM
Subject: Re: [openstack-dev] [TripleO] Moving instack upstream


snip


I would really like to see us rename python-rdomanager-oscplugin. I
think any project having the name RDO in it probably doesn't belong
under TripleO proper. Looking at the project there are some distro
specific things... but those are fairly encapsulated (or could be made
so fairly easily).


I agree, it made sense as it was the entrypoint to RDO-Manager. However,
it could easily be called the python-tripleo-oscplugin or similar. The
changes would be really trivial, I can only think of one area that
may be distro specific.


I'm putting the commit together now to pull these repositories into 
upstream tripleo are we happy with the name python-tripleo-oscplugin ?




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Moving instack upstream

2015-07-24 Thread Derek Higgins



On 24/07/15 00:56, James Slagle wrote:

On Thu, Jul 23, 2015 at 2:40 AM, Derek Higgins der...@redhat.com wrote:

See below


On 21/07/15 20:29, Derek Higgins wrote:


Hi All,
 Something we discussed at the summit was to switch the focus of
tripleo's deployment method to deploy using instack using images built
with tripleo-puppet-elements. Up to now all the instack work has been
done downstream of tripleo as part of rdo. Having parts of our
deployment story outside of upstream gives us problems mainly because it
becomes very difficult to CI what we expect deployers to use while we're
developing the upstream parts.

Essentially what I'm talking about here is pulling instack-undercloud
upstream along with a few of its dependency projects (instack,
tripleo-common, tuskar-ui-extras etc..) into tripleo and using them in
our CI in place of devtest.

Getting our CI working with instack is close to working but has taken
longer then I expected because of various complications and distractions
but I hope to have something over the next few days that we can use to
replace devtest in CI, in a lot of ways this will start out by taking a
step backwards but we should finish up in a better place where we will
be developing (and running CI on) what we expect deployers to use.

Once I have something that works I think it makes sense to drop the jobs
undercloud-precise-nonha and overcloud-precise-nonha, while switching
overcloud-f21-nonha to use instack, this has a few effects that need to
be called out

1. We will no longer be running CI on (and as a result not supporting)
most of the the bash based elements
2. We will no longer be running CI on (and as a result not supporting)
ubuntu

Should anybody come along in the future interested in either of these
things (and prepared to put the time in) we can pick them back up again.
In fact the move to puppet element based images should mean we can more
easily add in extra distros in the future.

3. While we find our feet we should remove all tripleo-ci jobs from non
tripleo projects, once we're confident with it we can explore adding our
jobs back into other projects again

Nothing has changed yet, I order to check we're all on the same page
this is high level details of how I see things should proceed so shout
now if I got anything wrong or you disagree.



Ok, I have a POC that has worked end to end in our CI environment[1], there
are a *LOT* of workarounds in there so before we can merge it I need to
clean up and remove some of those workarounds and todo that a few things
need to move around, below is a list of what has to happen (as best I can
tell)

1) Pull in tripleo-heat-template spec changes to master delorean
We had two patches in the tripleo-heat-template midstream packaging that
havn't made it into the master packaging, these are
https://review.gerrithub.io/241056 Package firstboot and extraconfig
templates
https://review.gerrithub.io/241057 Package environments and newtork
directories


I've merged these 2 (the ones against the correct branch, not the ones
you abandoned :-) )


thanks





2) Fixes for instack-undercloud (I didn't push these directly incase it
effected people on old versions of puppet modules)
https://github.com/rdo-management/instack-undercloud/pull/5


Can you submit this on gerrithub?:
https://review.gerrithub.io/#/q/project:rdo-management/instack-undercloud


Duh, I don't know why I thought we were using gerrit for the templates 
and not instack*, sorry


https://review.gerrithub.io/241257
https://review.gerrithub.io/241257






3) Add packaging for various repositories into openstack-packaging
I've pulled the packaging for 5 repositories into
https://github.com/openstack-packages
https://github.com/openstack-packages/python-ironic-inspector-client
https://github.com/openstack-packages/python-rdomanager-oscplugin
https://github.com/openstack-packages/tuskar-ui-extras
https://github.com/openstack-packages/ironic-discoverd
https://github.com/openstack-packages/tripleo-common

I havn't imported these into gerrithub (incase following discussion we need
to delete them again) but assuming we're in agreement we should pull them
into gerrithub)

4) update rdoinfo
https://github.com/redhat-openstack/rdoinfo/pull/69
If everybody is happy with all above we should merge this, all of the
packages needed will now be on the delorean master repository

5) Move DELOREAN_REPO_URL in tripleo-ci to a new delorean repo that includes
all of the new packages

6) Take most of the workarounds out of this patch[1] and merge it

7) Reorg the tripleo ci tests (essentially remove all of the bash element
based tests).


3 - 7 sound good to me.



8) Pull instack, instack-undercloud, python-rdomanager-oscplugin,
triple-common, tuskar-ui-extras and maybe more into the upstream gerrit


+1, note that tripleo-common is already in gerrit/git.openstack.org
(http://git.openstack.org/cgit/openstack/tripleo-common)

ack, thanks, I've updated the patch to rdoinfo





 From here

Re: [openstack-dev] [TripleO] Moving instack upstream

2015-07-23 Thread Derek Higgins

See below

On 21/07/15 20:29, Derek Higgins wrote:

Hi All,
Something we discussed at the summit was to switch the focus of
tripleo's deployment method to deploy using instack using images built
with tripleo-puppet-elements. Up to now all the instack work has been
done downstream of tripleo as part of rdo. Having parts of our
deployment story outside of upstream gives us problems mainly because it
becomes very difficult to CI what we expect deployers to use while we're
developing the upstream parts.

Essentially what I'm talking about here is pulling instack-undercloud
upstream along with a few of its dependency projects (instack,
tripleo-common, tuskar-ui-extras etc..) into tripleo and using them in
our CI in place of devtest.

Getting our CI working with instack is close to working but has taken
longer then I expected because of various complications and distractions
but I hope to have something over the next few days that we can use to
replace devtest in CI, in a lot of ways this will start out by taking a
step backwards but we should finish up in a better place where we will
be developing (and running CI on) what we expect deployers to use.

Once I have something that works I think it makes sense to drop the jobs
undercloud-precise-nonha and overcloud-precise-nonha, while switching
overcloud-f21-nonha to use instack, this has a few effects that need to
be called out

1. We will no longer be running CI on (and as a result not supporting)
most of the the bash based elements
2. We will no longer be running CI on (and as a result not supporting)
ubuntu

Should anybody come along in the future interested in either of these
things (and prepared to put the time in) we can pick them back up again.
In fact the move to puppet element based images should mean we can more
easily add in extra distros in the future.

3. While we find our feet we should remove all tripleo-ci jobs from non
tripleo projects, once we're confident with it we can explore adding our
jobs back into other projects again

Nothing has changed yet, I order to check we're all on the same page
this is high level details of how I see things should proceed so shout
now if I got anything wrong or you disagree.


Ok, I have a POC that has worked end to end in our CI environment[1], 
there are a *LOT* of workarounds in there so before we can merge it I 
need to clean up and remove some of those workarounds and todo that a 
few things need to move around, below is a list of what has to happen 
(as best I can tell)


1) Pull in tripleo-heat-template spec changes to master delorean
We had two patches in the tripleo-heat-template midstream packaging that 
havn't made it into the master packaging, these are
https://review.gerrithub.io/241056 Package firstboot and extraconfig 
templates
https://review.gerrithub.io/241057 Package environments and newtork 
directories


2) Fixes for instack-undercloud (I didn't push these directly incase it 
effected people on old versions of puppet modules)

https://github.com/rdo-management/instack-undercloud/pull/5

3) Add packaging for various repositories into openstack-packaging
I've pulled the packaging for 5 repositories into 
https://github.com/openstack-packages

https://github.com/openstack-packages/python-ironic-inspector-client
https://github.com/openstack-packages/python-rdomanager-oscplugin
https://github.com/openstack-packages/tuskar-ui-extras
https://github.com/openstack-packages/ironic-discoverd
https://github.com/openstack-packages/tripleo-common

I havn't imported these into gerrithub (incase following discussion we 
need to delete them again) but assuming we're in agreement we should 
pull them into gerrithub)


4) update rdoinfo
https://github.com/redhat-openstack/rdoinfo/pull/69
If everybody is happy with all above we should merge this, all of the 
packages needed will now be on the delorean master repository


5) Move DELOREAN_REPO_URL in tripleo-ci to a new delorean repo that 
includes all of the new packages


6) Take most of the workarounds out of this patch[1] and merge it

7) Reorg the tripleo ci tests (essentially remove all of the bash 
element based tests).


8) Pull instack, instack-undercloud, python-rdomanager-oscplugin, 
triple-common, tuskar-ui-extras and maybe more into the upstream gerrit


From here on the way to run the tripleo will be to follow the 
documentation in instack-undercloud, we should no longer be using 
devtest, this means we've lost the automation devtest gave us, so we 
will have to slowly build that up again. The main thing we have gained 
is that we will now be developing upstream all parts of how we expect 
deployers to use tripleo.


- we will still have dependencies on tripleo-incubator we need to remove 
these (or move things into other repositories), essentially were 
finished with this process once we're no longer installing the tripleo 
package.
- The new CI (as is) is running on Fedora jenkins nodes but building 
(and deploying) centos images, we also

Re: [openstack-dev] [TripleO] Moving instack upstream

2015-07-22 Thread Derek Higgins

On 22/07/15 18:41, Gregory Haynes wrote:

Excerpts from Derek Higgins's message of 2015-07-21 19:29:49 +:

Hi All,
 Something we discussed at the summit was to switch the focus of
tripleo's deployment method to deploy using instack using images built
with tripleo-puppet-elements. Up to now all the instack work has been
done downstream of tripleo as part of rdo. Having parts of our
deployment story outside of upstream gives us problems mainly because it
becomes very difficult to CI what we expect deployers to use while we're
developing the upstream parts.

Essentially what I'm talking about here is pulling instack-undercloud
upstream along with a few of its dependency projects (instack,
tripleo-common, tuskar-ui-extras etc..) into tripleo and using them in
our CI in place of devtest.

Getting our CI working with instack is close to working but has taken
longer then I expected because of various complications and distractions
but I hope to have something over the next few days that we can use to
replace devtest in CI, in a lot of ways this will start out by taking a
step backwards but we should finish up in a better place where we will
be developing (and running CI on) what we expect deployers to use.

Once I have something that works I think it makes sense to drop the jobs
undercloud-precise-nonha and overcloud-precise-nonha, while switching
overcloud-f21-nonha to use instack, this has a few effects that need to
be called out

1. We will no longer be running CI on (and as a result not supporting)
most of the the bash based elements
2. We will no longer be running CI on (and as a result not supporting)
ubuntu


I'd like to point out that this means DIB will no longer have an image
booting test for Ubuntu. I have created a review[1] to try and get some
coverage of this in a dib speific test, hopefully we can get it merged
before we remove the tripleo ubuntu tests?


I should have mentioned this, the plan we discussed to cover this case 
was that the nonha test would build and boot an ubuntu based user image 
on the deployed cloud. I like the look of the test you proposed, I'll 
give it a proper review in the morning, which ever we end up using I 
agree we should continue to test that DIB can create a bootable Ubuntu 
image.






Should anybody come along in the future interested in either of these
things (and prepared to put the time in) we can pick them back up again.
In fact the move to puppet element based images should mean we can more
easily add in extra distros in the future.

3. While we find our feet we should remove all tripleo-ci jobs from non
tripleo projects, once we're confident with it we can explore adding our
jobs back into other projects again


I assume DIB will be keeping the tripleo jobs for now?
Yup, we should still be running tripleo tests on DIB although I don't 
believe we need to run every tripleo test on it.






Nothing has changed yet, I order to check we're all on the same page
this is high level details of how I see things should proceed so shout
now if I got anything wrong or you disagree.

Sorry for not sending this out sooner for those of you who weren't at
the summit,
Derek.



-Greg

[1] https://review.openstack.org/#/c/204639/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Moving instack upstream

2015-07-21 Thread Derek Higgins

Hi All,
   Something we discussed at the summit was to switch the focus of 
tripleo's deployment method to deploy using instack using images built 
with tripleo-puppet-elements. Up to now all the instack work has been 
done downstream of tripleo as part of rdo. Having parts of our 
deployment story outside of upstream gives us problems mainly because it 
becomes very difficult to CI what we expect deployers to use while we're 
developing the upstream parts.


Essentially what I'm talking about here is pulling instack-undercloud 
upstream along with a few of its dependency projects (instack, 
tripleo-common, tuskar-ui-extras etc..) into tripleo and using them in 
our CI in place of devtest.


Getting our CI working with instack is close to working but has taken 
longer then I expected because of various complications and distractions 
but I hope to have something over the next few days that we can use to 
replace devtest in CI, in a lot of ways this will start out by taking a 
step backwards but we should finish up in a better place where we will 
be developing (and running CI on) what we expect deployers to use.


Once I have something that works I think it makes sense to drop the jobs 
undercloud-precise-nonha and overcloud-precise-nonha, while switching 
overcloud-f21-nonha to use instack, this has a few effects that need to 
be called out


1. We will no longer be running CI on (and as a result not supporting) 
most of the the bash based elements
2. We will no longer be running CI on (and as a result not supporting) 
ubuntu


Should anybody come along in the future interested in either of these 
things (and prepared to put the time in) we can pick them back up again. 
In fact the move to puppet element based images should mean we can more 
easily add in extra distros in the future.


3. While we find our feet we should remove all tripleo-ci jobs from non 
tripleo projects, once we're confident with it we can explore adding our 
jobs back into other projects again


Nothing has changed yet, I order to check we're all on the same page 
this is high level details of how I see things should proceed so shout 
now if I got anything wrong or you disagree.


Sorry for not sending this out sooner for those of you who weren't at 
the summit,

Derek.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-06-09 Thread Derek Higgins



On 09/06/15 10:37, Dirk Müller wrote:

Hi Derek,

2015-06-09 0:34 GMT+02:00 Derek Higgins der...@redhat.com:


This patch would result in 80 packaging repositories being pulled into
gerrit.


I personally would prefer to start with fewer but common packages
between all RPM distros (is there more than Red Hat and SUSE ?) than
starting the process with 80, but I wouldn't object to that.


I selected these 80 to move all of what RDO is currently maintaining on 
gerrithub to review.openstack.org, this was perhaps too big a set and in 
RDO we instead may need to go hybrid.





o exactly what namespace/prefix to use in the naming, I've seen lots of
opinions but I'm not clear if we have come to a decision

o Should we use rdo in the packaging repo names and not rpm? I think
this ultimatly depends whether the packaging can be shared between RDO and
Suse or not.


Well, we're (SUSE that is) are interested in sharing the packaging,
and a non-RDO prefix would be preferred for the upstream coordination
efforts.


+1, I'd also like to see us share packaging so a non-RDO prefix should 
be avoided. I think we have a few possibilities here


1. pull what I've proposed (or a subset of it) into a rpm namespace and 
from there work in package to get them to a point where all rpm 
interested parties can use them.


2. pull them into an rdo namespace and from there work on convergence, 
as each package becomes usable by all interested parties we rename to rpm-


I know renaming is a PITA for infra so maybe move to Attic and import a 
new repo if its easier.


3. Same as 2 but start with Suse packaging


It is all a bit fuzzy for me right now as I'm not entirely
sure our goals for packaging are necessarily the same (e.g. we have
the tendency to include patches that have not been merged but are
proposed upstream and are +1'ed already into our packages should there
be a pressing need for us to do so (e.g. fixes an important platform
bug), but maybe we can find enough common goals to make this a
benificial effort for all of us.


For this specific example I think differences of opinion are ok, we 
should provide the tools for each party interest in the packaging can 
hook in their own patches (I'm not sure what this would look like yet), 
I'm assuming here that we would also have deployer's out there 
interested who would have their own custom patches and bug fixes that 
they are interested in.


But yes, there will be other differences that I'm sure we'll have to 
figure out.




There are quite some details to sort out as our packaging is for
historical and for various policy reasons that we need to stick to
slightly different than the RDO packaging. I think going over those
and see how we can merge them in a consolidated effort (or maintain
two variants together) is the first step IMHO.


+1, maybe we should schedule something in a few days where we could go 
though the differences of a specific package and how things could take 
shape.




Another important point for us is that we start with equal rights on
the upstream collaboration (at least on the RPM side, I am fine with
decoupling and not caring about the deb parts). I'm not overly
optimistic that a single PTL would be able to cover both the DEB and
RPM worlds, as I perceive them quite different in details.


yup, seems reasonable to me



Greetings,
Dirk

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-06-08 Thread Derek Higgins



On 03/06/15 17:28, Haïkel wrote:

2015-06-03 17:23 GMT+02:00 Thomas Goirand z...@debian.org:

i
On 06/03/2015 12:41 AM, James E. Blair wrote:

Hi,

This came up at the TC meeting today, and I volunteered to provide an
update from the discussion.


I've just read the IRC logs. And there's one thing I would like to make
super clear.



I still haven't read the logs as we had our post-mortem meeting today,
but I'll try to address your points.


We, ie: Debian  Ubuntu folks, are very much clear on what we want to
achieve. The project has been maturing in our heads for like more than 2
years. We would like that ultimately, only a single set of packages Git
repositories exist. We already worked on *some* convergence during the
last years, but now we want a *full* alignment.

We're not 100% sure how the implementation details will look like for
the core packages (like about using the Debconf interface for
configuring packages), but it will eventually happen. For all the rest
(ie: Python module packaging), which represent the biggest work, we're
already converging and this has zero controversy.

Now, the Fedora/RDO/Suse people jumped on the idea to push packaging on
the upstream infra. Great. That's socially tempting. But technically, I
don't really see the point, apart from some of the infra tooling (super
cool if what Paul Belanger does works for both Deb+RPM). Finally,
indeed, this is not totally baked. But let's please not delay the
Debian+Ubuntu upstream Gerrit collaboration part because of it. We would
like to get started, and for the moment, nobody is approving the
/stackforge/deb-openstack-pkg-tools [1] new repository because we're
waiting on the TC decision.



First, we all agree that we should move packaging recipes (to use a
neutral term)
and reviewing to upstream gerrit. That should *NOT* be delayed.
We (RDO) are even willing to transfer full control of the openstack-packages
namespace on github. If you want to use another namespace, it's also
fine with us.

Then, about the infra/tooling things, it looks like a misunderstanding.
If we don't find an agreement on these topics, it's perfectly fine and
should not
prevent moving to upstream gerrit

So let's break the discussion in two parts.

1. upstream gerrit shared by everyone and get this started asap


In an attempt to document how this would look for RDO, I've started a 
patch[1] that I'll iterate on while this discussions converges on a 
solution that will work.


This patch would result in 80 packaging repositories being pulled into 
gerrit.


I've left a TODO in the commit message to track questions I believe we 
still have to answer, most notably


o exactly what namespace/prefix to use in the naming, I've seen lots of 
opinions but I'm not clear if we have come to a decision


o Should we use rdo in the packaging repo names and not rpm? I think 
this ultimatly depends whether the packaging can be shared between RDO 
and Suse or not.


o Do the RDO packaging repo's fall under this project[2] or is it its 
own group



[1] https://review.openstack.org/#/c/189497
[2] https://review.openstack.org/#/c/185187





2. continue discussion about infra/tooling within the new project, without
presumin the outcome.

Does it look like a good compromise to you?

Regards,
H.



Cheers,

Thomas Goirand (zigo)

[1] https://review.openstack.org/#/c/185164/


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-06-03 Thread Derek Higgins



On 02/06/15 23:41, James E. Blair wrote:

Hi,

This came up at the TC meeting today, and I volunteered to provide an
update from the discussion.

In general, I think there is a lot of support for a packaging effort in
OpenStack.  The discussion here has been great; we need to answer a few
questions, get some decisions written down, and make sure we have
agreement.

Here's what we need to know:

1) Is this one or more than one horizontal effort?

In other words, do we think the idea of having a single packaging
project/team with collaboration among distros is going to work?  Or
should we look at it more like the deployment projects where we have
puppet and chef as top level OpenStack projects?


As far as packaging goes Id imaging the teams will be split into groups 
of people who are interested into specific packaging formats (or perhaps 
distro), these people would be responsible for package updates, reviews, 
etc...


On the specifics of the packaging details, collaboration between these 
groups should be encouraged  but not enforced. I would hope that this 
means we would find the places where packaging details can converge 
while staying within the constraints of distro recommendations.




Either way is fine, and regardless, we need to answer the next
questions:

2) What's the collaboration plan?

How will different distros collaborate with each other, if at all?  What
things are important to standardize on, what aren't and how do we
support them all.


Collaboration between these groups is important in order to keep a few 
things consistent


o package repository naming, we should all agree on a naming scheme for 
the packaging repositories to avoid situations where we have rpm-nova 
and deb-compute


o Tools to build packages in CI jobs should provide a consistent 
interface regardless of packaging being built




3) What are the plans for repositories and their contents?

What repos will be created, and what will be in them.  When will new
ones be created, and is there any process around that.


Assuming you mean git repositories ? I think anything under the 
openstack(or stackforge) umbrella is fair game along with anything in 
the global-requirements file.


If you meant package repositories I think none is a fine answer for the 
moment but if there is an appetite for them then I think what would 
eventually make most sense are repositories for master branches along 
with supported stable branches. This may differ between packaging 
formats and what their teams are prepared to support.




4) Who is on the team(s)?

Who is interested in the overall effort?  Who is signing up for
distro-specific work?  Who will be the initial PTL?


From the RDO point of view we are doing the trunk chasing work already 
downstream. If we were to shift this packaging upstream of RDO I would 
imagine we would just switch the gerrit we are submitting too. I don't 
speak for RDO but of the people I spoke too I didn't hear any resistance 
to this idea.




I think if the discussion here can answer those questions, you should
update the governance repo change with that information, we can get all
the participants to ack that, and the TC will be able to act.


Great and thanks,
Derek.



Thanks again for driving this.

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-05-29 Thread Derek Higgins



On 28/05/15 22:09, Thomas Goirand wrote:

On 05/28/2015 02:53 PM, Derek Higgins wrote:



On 28/05/15 12:07, Jaume Devesa wrote:

Hi Thomas,

Delorean is a tool to build rpm packages from master branches (maybe any
branch?) of OpenStack projects.

Check out here:
https://www.rdoproject.org/packaging/rdo-packaging.html#master-pkg-guide


Following those instructions you'll notice that the rpm's are being
built using rpmbuild inside a docker container, if expanding to add dep
support this is where we could plug in sbuild.


sbuild by itself already provides the single use trow-able chroot
feature, with very effective back ends like AUFS or LVM snapshots.
Adding docker would only have the bad effect to remove the package
caching feature of sbuild, so it makes no sense to use it, as sbuild
would constantly download from the internet instead of using its package
cache.

Also, it is my understanding that infra will not accept to use
long-living VMs, and prefer to spawn new instances. In such a case, I
don't see the point using docker which would be a useless layer. In
fact, I was even thinking that in this case, sbuild wouldn't be
required, and we could simply use mk-build-deps and git-buildpackage
without even using sbuild. The same dependency resolver (ie: apt) would
then be in use, just without the added sbuild layer. I used that already
to automatically build backports, and it was really fast.

Did I miss something here? Apart from the fact that Docker is trendy,
what feature would it bring?



The reason I choose docker was essentially for a chroot to build the 
packages in while having the various distro images easily available, 
other people have shown interest in using mock in the passed so we may 
switch to it at some stage in the future. Whats important I think is 
that we can change things to use sbuild without docker if that is what 
works best for you for debs.


I think the feature in delorean that is most useful is that it will 
continue to maintain a history of usable package repositories 
representing the openstack projects over time, for this we would need a 
long running instance, but that can happen outside of infra.


Once we have all of the packaging available in infra we can use any tool 
to build it as part of CI, my preference for delorean is because it 
would match how we would want to run a long running delorean server.


All of this needs to be preceded by actually importing the packaging 
into review.openstack.org , so lets talk to infra first about how we 
should go about that and we can converge on processes secondary to that?



I'm traveling for a lot of next week but would like to try and start 
working on importing things to gerrit soon, so will try and get some 
prep done over the next week to import the RDO packaging but in reality 
it will probably be the following week before its ready (unless of 
course somebody else wants todo it).





By the way, one question: does Delorean use mock? We had the discussion
during an internal meeting, and we were not sure about this...


Nope, not using mock currently



Cheers,

Thomas Goirand (zigo)


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-05-29 Thread Derek Higgins



On 28/05/15 20:58, Paul Belanger wrote:

On 05/27/2015 05:26 PM, Derek Higgins wrote:

On 27/05/15 09:14, Thomas Goirand wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi all,

tl;dr:
- - We'd like to push distribution packaging of OpenStack on upstream
gerrit with reviews.
- - The intention is to better share the workload, and improve the
overall
QA for packaging *and* upstream.
- - The goal is *not* to publish packages upstream
- - There's an ongoing discussion about using stackforge or openstack.
This isn't, IMO, that important, what's important is to get started.
- - There's an ongoing discussion about using a distribution specific
namespace, my own opinion here is that using /openstack-pkg-{deb,rpm} or
/stackforge-pkg-{deb,rpm} would be the most convenient because of a
number of technical reasons like the amount of Git repository.
- - Finally, let's not discuss for too long and let's do it!!! :)

Longer version:

Before I start: some stuff below is just my own opinion, others are just
given facts. I'm sure the reader is smart enough to guess which is what,
and we welcome anyone involved in the project to voice an opinion if
he/she differs.

During the Vancouver summit, operation, Canonical, Fedora and Debian
people gathered and collectively expressed the will to maintain
packaging artifacts within upstream OpenStack Gerrit infrastructure. We
haven't decided all the details of the implementation, but spent the
Friday morning together with members of the infra team (hi Paul!) trying
to figure out what and how.

A number of topics have been raised, which needs to be shared.

First, we've been told that such a topic deserved a message to the dev
list, in order to let groups who were not present at the summit. Yes,
there was a consensus among distributions that this should happen, but
still, it's always nice to let everyone know.

So here it is. Suse people (and other distributions), you're welcome to
join the effort.

- - Why doing this

It's been clear to both Canonical/Ubuntu teams, and Debian (ie: myself)
that we'd be a way more effective if we worked better together, on a
collaborative fashion using a review process like on upstream Gerrit.
But also, we'd like to welcome anyone, and especially the operation
folks, to contribute and give feedback. Using Gerrit is the obvious way
to give everyone a say on what we're implementing.

As OpenStack is welcoming every day more and more projects, it's making
even more sense to spread the workload.

This is becoming easier for Ubuntu guys as Launchpad now understand not
only BZR, but also Git.

We'd start by merging all of our packages that aren't core packages
(like all the non-OpenStack maintained dependencies, then the Oslo libs,
then the clients). Then we'll see how we can try merging core packages.

Another reason is that we believe working with the infra of OpenStack
upstream will improve the overall quality of the packages. We want to be
able to run a set of tests at build time, which we already do on each
distribution, but now we want this on every proposed patch. Later on,
when we have everything implemented and working, we may explore doing a
package based CI on every upstream patch (though, we're far from doing
this, so let's not discuss this right now please, this is a very long
term goal only, and we will have a huge improvement already *before*
this is implemented).

- - What it will *not* be
===
We do not have the intention (yet?) to publish the resulting packages
built on upstream infra. Yes, we will share the same Git repositories,
and yes, the infra will need to keep a copy of all builds (for example,
because core packages will need oslo.db to build and run unit tests).
But we will still upload on each distributions on separate repositories.
So published packages by the infra isn't currently discussed. We could
get to this topic once everything is implemented, which may be nice
(because we'd have packages following trunk), though please, refrain to
engage in this topic right now: having the implementation done is more
important for the moment. Let's try to stay on tracks and be
constructive.

- - Let's keep efficiency in mind
===
Over the last few years, I've been able to maintain all of OpenStack in
Debian with little to no external contribution. Let's hope that the
Gerrit workflow will not slow down too much the packaging work, even if
there's an unavoidable overhead. Hopefully, we can implement some
liberal ACL policies for the core reviewers so that the Gerrit workflow
don't slow down anyone too much. For example we may be able to create
new repositories very fast, and it may be possible to self-approve some
of the most trivial patches (for things like typo in a package
description, adding new debconf translations, and such obvious fixes, we
shouldn't waste our time).

There's a middle ground between the current system (ie: only write
access ACLs

Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-05-29 Thread Derek Higgins



On 29/05/15 02:54, Steve Kowalik wrote:

On 29/05/15 06:41, Haïkel wrote:

Here's the main script to rebuild a RPM package.
https://github.com/openstack-packages/delorean/blob/master/scripts/build_rpm.sh

The script basically uses rpmbuild to build packages, we could have a
build_deb.sh that uses
sbuild and add dockerfiles for the Debian/Ubuntu supported releases.


I have a preliminary patch locally that adds support for building for
both Debian and Ubuntu. I will be applying some polish next week and
then working with the Delorean guys to get it landed.



Thanks Steve, looking forward to seeing it

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-05-28 Thread Derek Higgins



On 28/05/15 12:07, Jaume Devesa wrote:

Hi Thomas,

Delorean is a tool to build rpm packages from master branches (maybe any
branch?) of OpenStack projects.

Check out here:
https://www.rdoproject.org/packaging/rdo-packaging.html#master-pkg-guide


Following those instructions you'll notice that the rpm's are being 
built using rpmbuild inside a docker container, if expanding to add dep 
support this is where we could plug in sbuild.





Regards,


On Thu, 28 May 2015 10:40, Thomas Goirand wrote:

Derek,

Thanks for what you wrote.

On 05/27/2015 11:26 PM, Derek Higgins wrote:

4. For deb packages you can create new repositories along side the
rdorpm-* repositories


My intention is to use deb-* as prefix, if Canonical team agrees.


   5. Add deb support to delorean, I know of at least one person who has
already explored this (steve cc'd), if delorean is too far off the path
of what we want to achieve and there is a better tool then I'm open to
change.


I don't know delorean at all, but what should be kept in mind is that,
for Debian and Ubuntu, we *must* use sbuild, which is what is used on
the buildd networks.

I also started working on openstack-pkg-tools to provide such sbuild
based build env, so I'm not sure if we need to switch to Delorean. Could
you point me to some documentation about it, so I can see by myself what
Delorean is about?

Cheers,

Thomas

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [packaging] Adding packaging as an OpenStack project

2015-05-27 Thread Derek Higgins

On 27/05/15 09:14, Thomas Goirand wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi all,

tl;dr:
- - We'd like to push distribution packaging of OpenStack on upstream
gerrit with reviews.
- - The intention is to better share the workload, and improve the overall
QA for packaging *and* upstream.
- - The goal is *not* to publish packages upstream
- - There's an ongoing discussion about using stackforge or openstack.
This isn't, IMO, that important, what's important is to get started.
- - There's an ongoing discussion about using a distribution specific
namespace, my own opinion here is that using /openstack-pkg-{deb,rpm} or
/stackforge-pkg-{deb,rpm} would be the most convenient because of a
number of technical reasons like the amount of Git repository.
- - Finally, let's not discuss for too long and let's do it!!! :)

Longer version:

Before I start: some stuff below is just my own opinion, others are just
given facts. I'm sure the reader is smart enough to guess which is what,
and we welcome anyone involved in the project to voice an opinion if
he/she differs.

During the Vancouver summit, operation, Canonical, Fedora and Debian
people gathered and collectively expressed the will to maintain
packaging artifacts within upstream OpenStack Gerrit infrastructure. We
haven't decided all the details of the implementation, but spent the
Friday morning together with members of the infra team (hi Paul!) trying
to figure out what and how.

A number of topics have been raised, which needs to be shared.

First, we've been told that such a topic deserved a message to the dev
list, in order to let groups who were not present at the summit. Yes,
there was a consensus among distributions that this should happen, but
still, it's always nice to let everyone know.

So here it is. Suse people (and other distributions), you're welcome to
join the effort.

- - Why doing this

It's been clear to both Canonical/Ubuntu teams, and Debian (ie: myself)
that we'd be a way more effective if we worked better together, on a
collaborative fashion using a review process like on upstream Gerrit.
But also, we'd like to welcome anyone, and especially the operation
folks, to contribute and give feedback. Using Gerrit is the obvious way
to give everyone a say on what we're implementing.

As OpenStack is welcoming every day more and more projects, it's making
even more sense to spread the workload.

This is becoming easier for Ubuntu guys as Launchpad now understand not
only BZR, but also Git.

We'd start by merging all of our packages that aren't core packages
(like all the non-OpenStack maintained dependencies, then the Oslo libs,
then the clients). Then we'll see how we can try merging core packages.

Another reason is that we believe working with the infra of OpenStack
upstream will improve the overall quality of the packages. We want to be
able to run a set of tests at build time, which we already do on each
distribution, but now we want this on every proposed patch. Later on,
when we have everything implemented and working, we may explore doing a
package based CI on every upstream patch (though, we're far from doing
this, so let's not discuss this right now please, this is a very long
term goal only, and we will have a huge improvement already *before*
this is implemented).

- - What it will *not* be
===
We do not have the intention (yet?) to publish the resulting packages
built on upstream infra. Yes, we will share the same Git repositories,
and yes, the infra will need to keep a copy of all builds (for example,
because core packages will need oslo.db to build and run unit tests).
But we will still upload on each distributions on separate repositories.
So published packages by the infra isn't currently discussed. We could
get to this topic once everything is implemented, which may be nice
(because we'd have packages following trunk), though please, refrain to
engage in this topic right now: having the implementation done is more
important for the moment. Let's try to stay on tracks and be constructive.

- - Let's keep efficiency in mind
===
Over the last few years, I've been able to maintain all of OpenStack in
Debian with little to no external contribution. Let's hope that the
Gerrit workflow will not slow down too much the packaging work, even if
there's an unavoidable overhead. Hopefully, we can implement some
liberal ACL policies for the core reviewers so that the Gerrit workflow
don't slow down anyone too much. For example we may be able to create
new repositories very fast, and it may be possible to self-approve some
of the most trivial patches (for things like typo in a package
description, adding new debconf translations, and such obvious fixes, we
shouldn't waste our time).

There's a middle ground between the current system (ie: only write
access ACLs for git.debian.org with no other check what so ever) and a
too restrictive fully protected gerrit 

Re: [openstack-dev] [TripleO] Core reviewer update proposal

2015-05-05 Thread Derek Higgins



On 05/05/15 12:57, James Slagle wrote:

Hi, I'd like to propose adding Giulio Fidente and Steve Hardy to TripleO Core.

Giulio has been an active member of our community for a while. He
worked on the HA implementation in the elements and recently has been
making a lot of valuable contributions and reviews related to puppet
in the manifests, heat templates, ceph, and HA.

Steve Hardy has been instrumental in providing a lot of Heat domain
knowledge to TripleO and his reviews and guidance have been very
beneficial to a lot of the template refactoring. He's also been
reviewing and contributing in other TripleO projects besides just the
templates, and has shown a solid understanding of TripleO overall.

180 day stats:
| gfidente | 2080  42 166   0   079.8% |
16 (  7.7%)  |
|  shardy  | 2060  27 179   0   086.9% |
16 (  7.8%)  |

TripleO cores, please respond with +1/-1 votes and any
comments/objections within 1 week.


+1 to both



Giulio and Steve, also please do let me know if you'd like to serve on
the TripleO core team if there are no objections.

I'd also like to give a heads-up to the following folks whose review
activity is very low for the last 90 days:
|   tomas-8c8 **   |   80   0   0   8   2   100.0% |0 (  0.0%)  |
|lsmola ** |   60   0   0   6   5   100.0% |0 (  0.0%)  |
| cmsj **  |   60   2   0   4   266.7% |0 (  0.0%)  |
|   jprovazn **|   10   1   0   0   0 0.0% |0 (  0.0%)  |
|   jonpaul-sullivan **|  no activity
Helping out with reviewing contributions is one of the best ways we
can make good forward progress in TripleO. All of the above folks are
valued reviewers and we'd love to see you review more submissions. If
you feel that your focus has shifted away from TripleO and you'd no
longer like to serve on the core team, please let me know.

I also plan to remove Alexis Lee from core, who previously has
expressed that he'd be stepping away from TripleO for a while. Alexis,
thank you for reviews and contributions!



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] TripleO: CI down... SSL cert expired

2015-04-13 Thread Derek Higgins

On 11/04/15 14:02, Dan Prince wrote:

Looks like our SSL certificate has expired for the currently active CI
cloud. We are working on getting a new one generated and installed.
Until then CI jobs won't get processed.


A new cert has been installed in the last few minutes and ZUUL has 
started kicking off new jobs so we should be through the backlog soon.


At this weeks meeting we'll discuss putting something in place to ensure 
we are ahead of this the next time.


Derek



Dan


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] CI outage

2015-03-30 Thread Derek Higgins


Tl;dr tripleo ci is back up and running, see below for more

On 21/03/15 01:41, Dan Prince wrote:

Short version:

The RH1 CI region has been down since yesterday afternoon.

We have a misbehaving switch and have file a support ticket with the
vendor to troubleshoot things further. We hope to know more this
weekend, or Monday at the latest.

Long version:

Yesterday afternoon we started seeing issues in scheduling jobs on the
RH1 CI cloud. We haven't made any OpenStack configuration changes
recently, and things have been quite stable for some time now (our
uptime was 365 days on the controller).

Initially we found a misconfigured Keystone URL which was preventing
some diagnostic queries via OS clients external to the rack. This
setting hadn't been recently changed however and didn't seem to bother
nodepool before so I don't think it is the cause of the outage...

MySQL also got a bounce. It seemed happy enough after a restart as well.

After fixing the keystone setting and bouncing MySQL instances appears
to go ACTIVE but we were still having connectivity issues getting
floating IPs and DHCP working on overcloud instances. After a good bit
of debugging we started looking at the switches. Turns out one of them
has a high CPU usuage (above the warning threshold) and MAC address are
also unstable (ports are moving around).

Until this is resolved RH1 is unavailable to host jobs CI jobs. Will
post back here with an update once we have more information.


RH1 has been running as expected since last Thursday afternoon which 
means the cloud was down for almost a week, I'm left not entirely sure 
what some problems were, at various times during the week we tried a 
number of different interventions which may have caused (or exposed) 
some of our problems, e.g.


at one stage we restarted openvswitch in an attempt to ensure nothing 
had gone wrong with our ovs tunnels, around the same time (and possible 
caused by the restart), we started getting progressively worse 
connections to some of our servers. With lots of entries like this on 
our bastion server
Mar 20 13:22:49 host01-rack01 kernel: bond0.5: received packet with own 
address as source address


Not linking the restart with the looping packets message and instead 
thinking we may have a problem with the switch we put in a call with our 
switch vendor.


Continuing to chase down a problem on our own servers we noticed that 
tcpdump was reporting at times about 100,000 ARP packets per second 
(sometimes more).


Various interventions stopped the excess broadcast traffic e.g.
  Shutting down most of the compute nodes stopped the excess traffic, 
but the problem wasn't linked to any one particular compute node
  Running the tripleo os-refresh-config script on each compute node 
stopped the excess traffic


But restarting the controller node caused the excess traffic to return

Eventually we got the cloud running without the flood of broadcast 
traffic, with a small number of compute nodes, but instances still 
weren't getting IP address, with nova and neutron in debug mode we saw 
an error where nova failing to mount the qcow image (iirc it was 
attempting to resize the image).


Unable to figure out why this was working in the past but now isn't we 
redeployed this single compute node using the original image that was 
used (over a year ago), instances on this compute node we're booting but 
failing to get an IP address, we noticed this was because of a 
difference between the time on the controller when compared to the 
compute node. After resetting the time, now instances were booting and 
networking was working as expected (this was now Wednesday evening).


Looking back at the error while mounting the qcow image, I believe this 
was a red herring, it looks like this problem was always present on our 
system but we didn't have scary looking tracebacks in the logs until we 
switched to debug mode.


Now pretty confident we can get back to a running system by starting up 
all the compute nodes again and ensuring the os-refresh-config scripts 
were run then ensuring the times were all set on each host properly we 
decided to remove any entropy the may have built up while debugging 
problems on each computes node so we redeployed all of our compute nodes 
from scratch. This all went as expected but was a little time consuming 
as we spent time to verify each step as we went along, the steps went 
something like this


o with the exception of the overcloud controller, nova delete all of 
the hosts on the undercloud (31 hosts)


o we now have a problem, in tripleo the controller and compute nodes are 
tied together in a single heat template, so we need the heat template 
that was used a year ago to deploy the whole overcloud along with the 
parameters that were passed into it, we had actually done this before 
when adding new compute nodes to the cloud so it wasn't new territory.
   o Use heat template-show ci-overcloud to get the original heat 
template (a 

Re: [openstack-dev] [TripleO] update on Puppet integration in Kilo

2015-02-18 Thread Derek Higgins


On 11/02/15 17:06, Dan Prince wrote:

I wanted to take a few minutes to go over the progress we've made with
TripleO Puppet in Kilo so far.

For those unfamilar with the efforts our initial goal was to be able to
use Puppet as the configuration tool for a TripleO deployment stack.
This is largely built around a Heat capability added in Icehouse called
Software Deployments. By making use of use of the Software Deployment
Puppet hook and building our images with a few puppet specific elements
we can integrate with puppet as a configuration tool. There has been no
blueprint on this effort... blueprints seemed a bit ridged for the task
at hand. After demo'ing the proof of concept patches in Paris we've been
tracking progress on an etherpad here:

https://etherpad.openstack.org/p/puppet-integration-in-heat-tripleo

Lots of details in that etherpad. But I would like to highlight a few
things:

As of a week or so all of the code needed to run devtest_overcloud.sh to
completion using Puppet (and Fedora packages) has landed. Several
upstream TripleO developers have been successful in setting up a Puppet
overcloud using this process.

As of last Friday We have a running CI job! I'm actually very excited
about this one for several reasons. First CI is going to be crucial in
completing the rest of the puppet feature work around HA, etc. Second
because this job does require packages... and a fairly recent Heat
release we are using a new upstream packaging tool called Delorean.
Delorean makes it very easy to go back in time so if the upstream
packages break for some reason plugging in a stable repo from yesterday,
or 5 minutes ago should be a quick fix... Lots of things to potentially
talk about in this area around CI on various projects.

The puppet deployment is also proving to be quite configurable. We have
a Heat template parameter called 'EnablePackageInstall' which can be
used to enable or disable Yum package installation at deployment time.
So if you want to do traditional image based deployment with images
containing all of your packages you can do that (no Yum repositories
required). Or if you want to roll out images and install or upgrade
packages at deployment time (rather than image build time) you can do
that too... all by simply modifying this parameter. I think this sort of
configurability should prove useful to those who want a bit of choice
with regards to how packages and the like get installed.

Lots of work is still ongoing (documented in the etherpad above for
now). I would love to see multi-distro support for Puppet configuration
w/ TripleO. At present time we are developing and testing w/ Fedora...
but because puppet is used as a configuration tool I would say adding
multi-distro support should be fairly straightforward. Just a couple of
bits in the tripleo-puppet-elements... and perhaps some upstream
packages too (Delorean would be a good fit here for multi-distro too).
Great work, I'd love to see us either replace this fedora job with a 
centos one or add another job for centos, I think this would better 
represent what we expect endusers to deploy. Of course we would need 
todo a bit of work to make that work, I'd be willing to help out here if 
we decide to do it.




Also, the feedback from those in the Puppet community has been
excellent. Emilien Macchi, Yanis Guenene, Spencer Krum, and Colleen
Murphy have all been quite helpful with questions about style, how to
best use the modules, etc.

Likewise, Steve Hardy and Steve Baker have been very helpful in
addressing issues in the Heat templates.

Appreciate all the help and feedback.

Dan


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Bug squashing followup

2015-01-28 Thread Derek Higgins
Take up on this was a bit lower then I would have hoped but lets go
ahead with it anyways ;-)

we had 6 volunteers altogether.

I've taken the list of current tripleo bugs and split them into groups
of 15 (randomly shuffled) and assigned a group to each volunteer.

Each person should take a look at the bugs they have been assigned
decide if it is still current(contact others if necessary), if its not
close it. Once your happy a bug has been correctly assessed then strike
it through on the etherpad.

To others, its not too late ;-), just add your name to the etherpad and
assign yourself a group of 15 bugs.

thanks,
Derek.

On 18/12/14 11:25, Derek Higgins wrote:
 While bug squashing yesterday, I went through quite a lot of bugs
 closing those that were already fixed or no longer relevant, closing
 around 40 bugs. I eventually ran out of time, but I'm pretty sure if we
 split the task up between us we could weed out a lot more.
 
 What I'd like to do is, as a once off, randomly split up all the bugs to
 a group of volunteers (hopefully a large number of people), each person
 gets assigned X number of bugs and is then responsible for just deciding
 if it is still a relevant bug (or finding somebody who can help decide)
 and closing if necessary. Nothing needs to get fixed here we just need
 to make sure people are have a uptodate list of relevant bugs.
 
 So who wants to volunteer? We probably need about 15+ people for this to
 be split into manageable chunks. If your willing to help out just add
 your name to this list
 https://etherpad.openstack.org/p/tripleo-bug-weeding
 
 If we get enough people I'll follow up by splitting out the load and
 assigning to people.
 
 The bug squashing day yesterday put a big dent in these, but wasn't
 entirely focused on weeding out stale bugs, some people probably got
 caught up fixing individual bugs and it wasn't helped by a temporary
 failure of our CI jobs (provoked by a pbr update and we were building
 pbr when we didn't need to be).
 
 thanks,
 Derek.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Switching CI back to amd64

2015-01-08 Thread Derek Higgins
On 07/01/15 23:41, Ben Nemec wrote:
 On 01/07/2015 11:29 AM, Clint Byrum wrote:
 Excerpts from Derek Higgins's message of 2015-01-07 02:51:41 -0800:
 Hi All,
 I intended to bring this up at this mornings meeting but the train I
 was on had no power sockets (and I had no battery) so sending to the
 list instead.

 We currently run our CI with on images built for i386, we took this
 decision a while back to save memory ( at the time is allowed us to move
 the amount of memory required in our VMs from 4G to 2G (exactly where in
 those bands the hard requirements are I don't know)

 Since then we have had to move back to 3G for the i386 VM as 2G was no
 longer enough so the saving in memory is no longer as dramatic.

 Now that the difference isn't as dramatic, I propose we switch back to
 amd64 (with 4G vms) in order to CI on what would be closer to a
 production deployment and before making the switch wanted to throw the
 idea out there for others to digest.

 This obviously would impact our capacity as we will have to reduce the
 number of testenvs per testenv hosts. Our capacity (in RH1 and roughly
 speaking) allows us to run about 1440 ci jobs per day. I believe we can
 make the switch and still keep capacity above 1200 with a few other changes
 1. Add some more testenv hosts, we have 2 unused hosts at the moment and
 we can probably take 2 of the compute nodes from the overcloud.
 2. Kill VM's at the end of each CI test (as opposed to leaving them
 running until the next CI test kills them), allowing us to more
 successfully overcommit on RAM
 3. maybe look into adding swap on the test env hosts, they don't
 currently have any, so over committing RAM is a problem the the OOM
 killer is handling from time to time (I only noticed this yesterday).

 The other benefit to doing this is that is we were to ever want to CI
 images build with packages (this has come up in previous meetings) we
 wouldn't need to provide i386 packages just for CI, while the rest of
 the world uses the amd64.

 +1 on all counts.

 It's also important to note that we should actually have a whole new
 rack of servers added to capacity soon (I think soon is about 6 months
 so far, but we are at least committed to it). So this would be, at worst,
 a temporary loss of 240 jobs per day.
 
 Actually it should be sooner than that - hp1 still isn't in the CI
 rotation yet, so once that infra change merges (the only thing
 preventing us from using it AFAIK) we'll be getting a bunch more
 capacity in the much nearer term.  Unless Derek is already counting that
 in his estimates above, of course.
Yes, this is correct, hp1 isn't in use at the moment and approximately
double those numbers.

 
 I don't feel like we've been all that capacity constrained lately
 anyway, so as I said in my other (largely unnecessary, as it turns out)
 email, I'm +1 on doing this.
Correct we're not currently constrained on capacity at all (most days we
run less then 300 jobs), but once the other region is in use we'll be
hoping to add jobs to other projects.

 
 -Ben
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] default region name

2015-01-08 Thread Derek Higgins
On 08/01/15 05:21, Zhou, Zhenzan wrote:
 Hi, 
 
 Does anyone know why TripleO uses regionOne as default region name? A 
 comment in the code says it's the default keystone uses. 
 But I cannot find any regionOne in keystone code. Devstack uses RegionOne 
 by default and I do see lots of RegionOne in keystone code.

Looks like this has been changing in various places
https://bugs.launchpad.net/keystone/+bug/1252299

I guess the default the code is referring to is in keystoneclient
http://git.openstack.org/cgit/openstack/python-keystoneclient/tree/keystoneclient/v2_0/shell.py#n509



 
 stack@u140401:~/openstack/tripleo-incubator$ grep -rn regionOne *
 scripts/register-endpoint:26:REGION=regionOne # NB: This is the default 
 keystone uses.
 scripts/register-endpoint:45:echo -r, --region  -- Override the 
 default region 'regionOne'.
 scripts/setup-endpoints:33:echo -r, --region-- Override 
 the default region 'regionOne'.
 scripts/setup-endpoints:68:REGION=regionOne #NB: This is the keystone 
 default.
 stack@u140401:~/openstack/tripleo-incubator$ grep -rn regionOne 
 ../tripleo-heat-templates/
 stack@u140401:~/openstack/tripleo-incubator$  grep -rn regionOne 
 ../tripleo-image-elements/
 ../tripleo-image-elements/elements/tempest/os-apply-config/opt/stack/tempest/etc/tempest.conf:10:region
  = regionOne
 ../tripleo-image-elements/elements/neutron/os-apply-config/etc/neutron/metadata_agent.ini:3:auth_region
  = regionOne
 stack@u140401:~/openstack/keystone$ grep -rn RegionOne * | wc -l
 130
 stack@u140401:~/openstack/keystone$ grep -rn regionOne * | wc -l
 0
 
 Another question is that TripleO doesn't export OS_REGION_NAME in stackrc.  
 So when someone source devstack rc file 
 to do something and then source TripleO rc file again, the OS_REGION_NAME 
 will be the one set by devstack rc file. 
 I know this may be strange but isn't it better to use the same default value?

We should probably add that to our various rc files, not having it there
is probably the reason we used keystoneclients default in the first place.

 
 Thanks a lot.
 
 BR
 Zhou Zhenzan
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Switching CI back to amd64

2015-01-07 Thread Derek Higgins
Hi All,
I intended to bring this up at this mornings meeting but the train I
was on had no power sockets (and I had no battery) so sending to the
list instead.

We currently run our CI with on images built for i386, we took this
decision a while back to save memory ( at the time is allowed us to move
the amount of memory required in our VMs from 4G to 2G (exactly where in
those bands the hard requirements are I don't know)

Since then we have had to move back to 3G for the i386 VM as 2G was no
longer enough so the saving in memory is no longer as dramatic.

Now that the difference isn't as dramatic, I propose we switch back to
amd64 (with 4G vms) in order to CI on what would be closer to a
production deployment and before making the switch wanted to throw the
idea out there for others to digest.

This obviously would impact our capacity as we will have to reduce the
number of testenvs per testenv hosts. Our capacity (in RH1 and roughly
speaking) allows us to run about 1440 ci jobs per day. I believe we can
make the switch and still keep capacity above 1200 with a few other changes
1. Add some more testenv hosts, we have 2 unused hosts at the moment and
we can probably take 2 of the compute nodes from the overcloud.
2. Kill VM's at the end of each CI test (as opposed to leaving them
running until the next CI test kills them), allowing us to more
successfully overcommit on RAM
3. maybe look into adding swap on the test env hosts, they don't
currently have any, so over committing RAM is a problem the the OOM
killer is handling from time to time (I only noticed this yesterday).

The other benefit to doing this is that is we were to ever want to CI
images build with packages (this has come up in previous meetings) we
wouldn't need to provide i386 packages just for CI, while the rest of
the world uses the amd64.

Thanks,
Derek.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Bug squashing followup

2015-01-05 Thread Derek Higgins
See below, We got only 5 people signed up, anybody else willing to join
the effort ?

TL;DR if your willing to assess about 15 tripleo related bugs to decide
if they are still current then add your name to the etherpad.

thanks,
Derek

On 18/12/14 11:25, Derek Higgins wrote:
 While bug squashing yesterday, I went through quite a lot of bugs
 closing those that were already fixed or no longer relevant, closing
 around 40 bugs. I eventually ran out of time, but I'm pretty sure if we
 split the task up between us we could weed out a lot more.
 
 What I'd like to do is, as a once off, randomly split up all the bugs to
 a group of volunteers (hopefully a large number of people), each person
 gets assigned X number of bugs and is then responsible for just deciding
 if it is still a relevant bug (or finding somebody who can help decide)
 and closing if necessary. Nothing needs to get fixed here we just need
 to make sure people are have a uptodate list of relevant bugs.
 
 So who wants to volunteer? We probably need about 15+ people for this to
 be split into manageable chunks. If your willing to help out just add
 your name to this list
 https://etherpad.openstack.org/p/tripleo-bug-weeding
 
 If we get enough people I'll follow up by splitting out the load and
 assigning to people.
 
 The bug squashing day yesterday put a big dent in these, but wasn't
 entirely focused on weeding out stale bugs, some people probably got
 caught up fixing individual bugs and it wasn't helped by a temporary
 failure of our CI jobs (provoked by a pbr update and we were building
 pbr when we didn't need to be).
 
 thanks,
 Derek.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Bug squashing followup

2014-12-18 Thread Derek Higgins
While bug squashing yesterday, I went through quite a lot of bugs
closing those that were already fixed or no longer relevant, closing
around 40 bugs. I eventually ran out of time, but I'm pretty sure if we
split the task up between us we could weed out a lot more.

What I'd like to do is, as a once off, randomly split up all the bugs to
a group of volunteers (hopefully a large number of people), each person
gets assigned X number of bugs and is then responsible for just deciding
if it is still a relevant bug (or finding somebody who can help decide)
and closing if necessary. Nothing needs to get fixed here we just need
to make sure people are have a uptodate list of relevant bugs.

So who wants to volunteer? We probably need about 15+ people for this to
be split into manageable chunks. If your willing to help out just add
your name to this list
https://etherpad.openstack.org/p/tripleo-bug-weeding

If we get enough people I'll follow up by splitting out the load and
assigning to people.

The bug squashing day yesterday put a big dent in these, but wasn't
entirely focused on weeding out stale bugs, some people probably got
caught up fixing individual bugs and it wasn't helped by a temporary
failure of our CI jobs (provoked by a pbr update and we were building
pbr when we didn't need to be).

thanks,
Derek.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] How do the CI clouds work?

2014-12-18 Thread Derek Higgins
On 18/12/14 08:48, Steve Kowalik wrote:
 Hai,
 
   I am finding myself at a loss at explaining how the CI clouds that run
 the tripleo jobs work from end-to-end. I am clear that we have a tripleo
 deployment running on those racks, with a seed, a HA undercloud and
 overcloud, but then I'm left with a number of questions, such as:
Yup, this is correct, from a CI point of view all that is relevant is
the overcloud and a set of baremetal test env hosts. The seed and
undercloud are there because we used tripleo to deploy the thing in the
first place.

 
   How do we run the testenv images on the overcloud?
nodepool talks to our overcloud to create an instance where the jenkins
jobs run. This jenkins node is where we build the images, jenkins
doesn't manage and isn't aware of the testenvs hosts.

The entry point for jenkins to run tripleo ci is toci_gate_test.sh, at
the end of this script you'll see a call to testenv-client[1]

testenv-client talks to gearman (an instance on our overcloud, a
different gearman instance to what infra have), gearman responds with a
json file representing one of the the testenv's that have been
registered with it.

testenv-client then runs the command ./toci_devtest.sh and passes in
the json file (via $TE_DATAFILE). To prevent 2 CI jobs using the same
testenv, the testenv is now locked until toci_devtest exits. The
jenkins node now has all the relevant IPs and MAC addresses to talk to
the testenv.

 
   How do the testenv images interact with the nova-compute machines in
 the overcloud?
The images are built on instances in this cloud. The MAC address of eth1
on the seed in for the testenv has been registered with neutron on the
overcloud, so its IP is known (its in the json file we got in
$TE_DATAFILE). All traffic to the other instances in the CI testenv is
routed though the seed its eth2 shares a ovs bridge with eth1 from the
other VM's in the same testenv.

 
   Are the machines running the testenv images meant to be long-running,
 or are they recycled after n number of runs?
They are long running and in theory shouldn't need to be recycled, in
practice they get recycled sometimes for one of 2 reason
1. The image needs to be updated (e.g. to increase the amount of RAM on
the vibvirt domains they host)
2. If one is experiencing a problem, I usually do a nova rebuild on
it, this doesn't happen very frequently, we currently have 15 TE hosts
on rh1 7 have an uptime over 80 days, while the others are new HW that
was added last week. But problems we have encountered in the passed
causing a rebuild include a TE Host loosing its IP or
https://bugs.launchpad.net/tripleo/+bug/1335926
https://bugs.launchpad.net/tripleo/+bug/1314709

 
 Cheers,
No problem I tried to document this at one stage here[2] but feel free
to add more or point out where its lacking or ask questions here and
I'll attempt to answer.

thanks,
Derek.


[1]
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/toci_gate_test.sh?id=3d86dd4c885a68eabddb7f73a6dbe6f3e75fde64#n69
[2]
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/docs/TripleO-ci.rst

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] CI report : 1/11/2014 - 4/12/2014

2014-12-08 Thread Derek Higgins
On 04/12/14 13:37, Dan Prince wrote:
 On Thu, 2014-12-04 at 11:51 +, Derek Higgins wrote:
 A month since my last update, sorry my bad

 since the last email we've had 5 incidents causing ci failures

 26/11/2014 : Lots of ubuntu jobs failed over 24 hours (maybe half)
 - We seem to suffer any time an ubuntu mirror isn't in sync causing hash
 mismatch errors. For now I've pinned DNS on our proxy to a specific
 server so we stop DNS round robining
 
 This sound fine to me. I personally like the model where you pin to a
 specific mirror, perhaps one that is geographically closer to your
 datacenter. This also makes Squid caching (in the rack) happier in some
 cases.
 

 21/11/2014 : All tripleo jobs failed for about 16 hours
 - Neutron started asserting that local_ip be set to a valid ip address,
 on the seed we had been leaving it blank
 - Cinder moved to using  oslo.concurreny which in turn requires that
 lock_path be set, we are now setting it
 
 
 Thinking about how we might catch these ahead of time with our limited
 resources ATM. These sorts of failures all seem related to configuration
 and or requirements changes. I wonder if we were to selectively
 (automatically) run check experimental jobs on all reviews with
 associated tickets which have either doc changes or modify
 requirements.txt. Probably a bit of work to pull this off but if we had
 a report containing these results coming down the pike we might be
 able to catch them ahead of time.
Yup, this sounds like it could be beneficial, alternatively if we soon
have the capacity to run on more projects (capacity is increasing) we'll
be running on all reviews and we'll be able to generate the report your
talking about, either way we should do something like this soon.

 
 

 8/11/2014 : All fedora tripleo jobs failed for about 60 hours (over a
 weekend)
 - A url being accessed on  https://bzr.linuxfoundation.org is no longer
 available, we removed the dependency

 7/11/2014 : All tripleo tests failed for about 24 hours
 - Options were removed from nova.conf that had been deprecated (although
 no deprecation warnings were being reported), we were still using these
 in tripleo

 as always more details can be found here
 https://etherpad.openstack.org/p/tripleo-ci-breakages
 
 Thanks for sending this out! Very useful.
no problem
 
 Dan
 

 thanks,
 Derek.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Alternate meeting time

2014-12-02 Thread Derek Higgins
On 02/12/14 14:10, James Polley wrote:
 Months ago, I pushed for us to alternate meeting times to something that
 was friendlier to me, so we started doing alternate weeks at 0700UTC.
 That worked well for me, but wasn't working so well for a few people in
 Europe, so we decided to give 0800UTC a try. Then DST changes happened,
 and wiki pages got out of sync, and there was confusion about what time
 the meeting is at..
 
 The alternate meeting hasn't been very well attended for the last ~3
 meetings. Partly I think that's due to summit and travel plans, but it
 seems like the 0800UTC time doesn't work very well for quite a few people.
 
 So, instead of trying things at random, I've
 created https://etherpad.openstack.org/p/tripleo-alternate-meeting-time
 as a starting point for figuring out what meeting time might work well
 for the most people. Obviously the world is round, and people have
 different schedules, and we're never going to get a meeting time that
 works well for everyone - but it'd be nice to try to maximise attendance
 (and minimise inconvenience) as much as we can.
 
 If you regularly attend, or would like to attend, the meeting, please
 take a moment to look at the etherpad to register your vote for which
 time works best for you. There's even a section for you to cast your
 vote if the UTC1900 meeting (aka the main or US-Friendly meeting)
 works better for you! 


Can I suggest an alternative data gathering method, I've put each hour
in a week in a poll, for each slot you have 3 options

Yes, If needs be and no

if we all were to fill in this poll with what suits each of us we should
easily see the overlaps,

I just picked a week, ignore the dates, Assume times are UTC. Any
thoughts on this? I think it would allow us to explore all options
available.

http://doodle.com/27ffgkdm5gxzr654


 
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   >