With large sets of nodes to introspect we typically avoid using the bulk introspection. I have written a quick script that introspects a couple nodes at a time: https://gist.github.com/jtaleric/fcca3811cd4d8f37336f9532e5b9c9ff
Maybe we can add this sort of logic to bulk introspection, with some retries? On Tue, Oct 18, 2016 at 8:29 AM, John Trowbridge <tr...@redhat.com> wrote: > > > On 10/18/2016 07:20 AM, Wesley Hayutin wrote: >> See my response inline. >> >> On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsur <dtant...@redhat.com> wrote: >> >>> On 10/17/2016 11:10 PM, Wesley Hayutin wrote: >>> >>>> Greetings, >>>> >>>> The RDO CI team is considering adding retries to our calls to >>>> introspection >>>> again [1]. >>>> This is very handy for bare metal environments where retries may be >>>> needed due >>>> to random chaos in the environment itself. >>>> >>>> We're trying to balance two things here.. >>>> 1. reduce the number of false negatives in CI >>>> 2. try not to overstep what CI should vs. what the product should do. >>>> >>>> We would like to hear your comments if you think this is acceptable for >>>> CI or if >>>> this may be overstepping. >>>> >>>> Thank you >>>> >>>> >>>> [1] http://paste.openstack.org/show/586035/ >>>> >>> >>> Hi! >>> >>> I probably lack some context of what exactly problems you face. I don't >>> have any disagreement with retrying it, just want to make sure we're not >>> missing actual bugs. >>> >> >> I agree, we have to be careful not to paper over bugs while we try to >> overcome typical environmental delays that come w/ booting, rebooting $x >> number of random hardware nodes. >> To make this a little more crystal clear, I'm trying to determine is where >> progressive delays and retries should be injected into the workflow of >> deploying an overcloud. >> Should we add options in the product itself that allow for $x number of >> retries w/ a configurable set of delays for introspection? [2] Is the >> expectation this works the first time everytime? >> Are we overstepping what CI should do by implementing [1]. > > IMO, yes, we are overstepping what CI should be doing with [1]. Mostly > because we are providing a better UX in CI than an actual user will get. >> >> Additionally would it be appropriate to implement [1], while [2] is >> developed for the next release and is it OK to use [1] with older releases? >> > > However, I think it is ok to implement [1] in CI, if the following are true: > > 1) There is an in progress bug to make this UX better for non-CI user. > 2) For older releases if said bug is deemed inappropriate for backport. > >> Thanks for your time and responses. >> >> >> [1] http://paste.openstack.org/show/586035/ >> [2] >> https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169 >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev