Re: Testing Leader Election reconfiguration

Tom Barber Tue, 15 Mar 2016 13:37:38 -0700

Hey Cory

Not even I'm that crazy! :) I have recycled the bootstrapped test
environment but the only nodes running are those used in this test suite.


I tried to use wait_for_messages initially and was a little confused as to
what a "message" equated to (and again in those tests I got a timeout as
well, but I'm happy to retest)

If I want to wait for a message is it something from the right side of
status_set for example:
https://github.com/OSBI/layer-pdi/blob/master/reactive/pdi.py#L83
'Configuration
has changed, restarting Carte.'?

Thanks

Tom

--------------

Director Meteorite.bi - Saiku Analytics Founder
Tel: +44(0)5603641316

(Thanks to the Saiku community we reached our Kickstart
<http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
goal, but you can always help by sponsoring the project
<http://www.meteorite.bi/products/saiku/sponsorship>)

On 15 March 2016 at 17:54, Cory Johns <cory.jo...@canonical.com> wrote:

> Tom,
>
> It's also important to note that sentry.wait() waits for *all* units in
> the deployment to settle for at least 30 seconds, so it might be possible
> that another unit that wasn't included in the status gist you provided is
> churning and causing it to time out.  That's particularly possible if
> you're reusing the deployer instance and all 34+ of those machines (going
> by the machine numbers in your gist) are still extant; with that many
> machines, even the periodic update-status hooks could be overlapping enough
> to prevent the 30 second idle window from registering.
>
> I'd recommend using the wait_for_mesages [1] alternative which relies on
> the charm to report its status explicitly and thus doesn't need to use
> heuristics like the 30 second idle window.  It could also make your test
> case code a bit cleaner.
>
> And, of course, reusing units when possible and cleaning up between test
> cases can help, as well.
>
> [1]:
> https://pythonhosted.org/amulet/amulet.html#amulet.sentry.Talisman.wait_for_messages
>
> On Tue, Mar 15, 2016 at 1:02 PM, Tim Van Steenburgh <
> tim.van.steenbu...@canonical.com> wrote:
>
>>
>>
>> On Tue, Mar 15, 2016 at 12:30 PM, Tom Barber <t...@analytical-labs.com>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> Why would I need to increase the timeout when the status says all the
>>> unit are operational?
>>>
>>
>> The default wait time is 300s, with an "idle threshold" of 30s. Which
>> means, it waits for everything to be idle for 30s before returning from the
>> wait. This means that with the default timeout, if the env doesn't settle
>> within 4m30s, it'll time out. This may not be what's happening in your
>> case, but it's worth trying a longer timeout value to make sure.
>>
>>
>>> The status dump came out of bundletester which said that it failed on
>>> the first wait(), I assume the status dump arrived at the same time?
>>> Bugs are allowed, the test was hacked up from a previous one, it doesn't
>>> do anything yet, I'm trying to make sure the logic works first.
>>>
>>> Tom
>>>
>>> --------------
>>>
>>> Director Meteorite.bi - Saiku Analytics Founder
>>> Tel: +44(0)5603641316
>>>
>>> (Thanks to the Saiku community we reached our Kickstart
>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>> goal, but you can always help by sponsoring the project
>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>
>>> On 15 March 2016 at 16:27, Tim Van Steenburgh <
>>> tim.van.steenbu...@canonical.com> wrote:
>>>
>>>> Hey Tom,
>>>>
>>>> 1. You can increase the wait time until it doesn't time out:
>>>> self.d.sentry.wait(timeout=1200)
>>>> 2. At what point in this sequence of commands was the status dump
>>>> captured?
>>>> 3. There is a bug here. You take a reference to the pdi/0 info dict on
>>>> line 1. It's the same object you use to get message2 and message3 later.
>>>> So, you'll get the same message that you got on line 1. You need `message3
>>>> = self.d.sentry['pdi'][0].info['workload-status'].get('message')`
>>>> instead.
>>>>
>>>> Hope this helps.
>>>>
>>>> On Tue, Mar 15, 2016 at 11:41 AM, Tom Barber <t...@analytical-labs.com>
>>>> wrote:
>>>>
>>>>> Okay back here again, so my nice leader election function looks like:
>>>>>
>>>>>    def test_leader_election_failover(self):
>>>>>         unit = self.d.sentry['pdi'][0].info
>>>>>         message = unit['workload-status'].get('message')
>>>>>         ip = message.split(':', 1)[-1]
>>>>>         self.d.add_unit('pdi', 2)
>>>>>         self.d.sentry.wait()
>>>>>         message2 = unit['workload-status'].get('message')
>>>>>         ip2 = message2.split(':', 1)[-1]
>>>>>         self.assertEqual(ip, ip2)
>>>>>         self.d.remove_unit('pdi/0')
>>>>>         self.d.sentry.wait()
>>>>>         message3 = unit['workload-status'].get('message')
>>>>>         ip3 = message3.split(':', 1)[-1]
>>>>>
>>>>>         self.assertNotEqual(ip3, ip2)
>>>>>
>>>>> I know there's no logic in there, but I need to make sure the stuff
>>>>> actually functions.
>>>>>
>>>>> So Tim says wait() should work, but when I tested this last night,
>>>>>
>>>>> I get a timeout error o the wait right after add_unit.
>>>>>
>>>>> https://gist.github.com/buggtb/c271dd79d782af57dea6
>>>>>
>>>>> Yet in the status dump you can see all 3 units sat there seemingly
>>>>> happy.
>>>>>
>>>>> Tom
>>>>>
>>>>> --------------
>>>>>
>>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>>> Tel: +44(0)5603641316
>>>>>
>>>>> (Thanks to the Saiku community we reached our Kickstart
>>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>>> goal, but you can always help by sponsoring the project
>>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>>
>>>>> On 9 March 2016 at 18:31, Tom Barber <t...@analytical-labs.com> wrote:
>>>>>
>>>>>> Oh really?
>>>>>>
>>>>>> /me stokes his invisible beard.
>>>>>>
>>>>>>
>>>>>> Okay I'll go back and try again.
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> --------------
>>>>>>
>>>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>>>> Tel: +44(0)5603641316
>>>>>>
>>>>>> (Thanks to the Saiku community we reached our Kickstart
>>>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>>>> goal, but you can always help by sponsoring the project
>>>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>>>
>>>>>> On 9 March 2016 at 16:56, Tim Van Steenburgh <
>>>>>> tim.van.steenbu...@canonical.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 9, 2016 at 6:31 AM, Tom Barber <t...@analytical-labs.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Stuart.
>>>>>>>>
>>>>>>>> I do put a note in my charm message indicating the leader IP
>>>>>>>> address so that users know which to connect to.
>>>>>>>>
>>>>>>>> So with juju wait, would I destroy a unit then execute juju wait?
>>>>>>>> At which point it will hang until the leader election stuff is over 
>>>>>>>> and all
>>>>>>>> becomes stable again?
>>>>>>>>
>>>>>>>>
>>>>>>> Since you're already using amulet, there's no need to use the
>>>>>>> juju-wait plugin
>>>>>>> since d.sentry.wait() does the same thing. So yes, you would do
>>>>>>> d.remove_unit(...)
>>>>>>> and then call d.sentry.wait().
>>>>>>>
>>>>>>>
>>>>>>>> Also, will this work if I push it upstream to the charmers and the
>>>>>>>> automated tests up there?
>>>>>>>>
>>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Tom
>>>>>>>>
>>>>>>>> --------------
>>>>>>>>
>>>>>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>>>>>> Tel: +44(0)5603641316
>>>>>>>>
>>>>>>>> (Thanks to the Saiku community we reached our Kickstart
>>>>>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>>>>>> goal, but you can always help by sponsoring the project
>>>>>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>>>>>
>>>>>>>> On 9 March 2016 at 11:00, Stuart Bishop <
>>>>>>>> stuart.bis...@canonical.com> wrote:
>>>>>>>>
>>>>>>>>> On 9 March 2016 at 20:31, Tom Barber <t...@analytical-labs.com>
>>>>>>>>> wrote:
>>>>>>>>> > Morning all
>>>>>>>>> >
>>>>>>>>> > I'm trying to test for charm reconfiguration if the leader goes
>>>>>>>>> AWOL.
>>>>>>>>>
>>>>>>>>> I put the role of the unit in its workload status, so it is easy
>>>>>>>>> for
>>>>>>>>> operators to see which unit is master. And this also makes it easy
>>>>>>>>> for
>>>>>>>>> tests to tell.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> > Adam suggested that I watch the status waiting for the next
>>>>>>>>> leader election
>>>>>>>>> > hook the wait on that and then check my service configs.
>>>>>>>>>
>>>>>>>>> You are best of waiting for all the hooks to complete and a steady
>>>>>>>>> state, not just leader elected (since things will still be in flux
>>>>>>>>> when that hook fires, such as the leader-settings-changed hooks it
>>>>>>>>> will probably trigger and the relation changes those hooks will
>>>>>>>>> likely
>>>>>>>>> trigger). Use the juju-wait plugin, and maybe add support to
>>>>>>>>> https://bugs.launchpad.net/juju-core/+bug/1488777 to get this into
>>>>>>>>> core.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Stuart Bishop <stuart.bis...@canonical.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Juju mailing list
>>>>>>>> Juju@lists.ubuntu.com
>>>>>>>> Modify settings or unsubscribe at:
>>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> --
>> Juju mailing list
>> Juju@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju
>>
>>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: Testing Leader Election reconfiguration

Reply via email to