Thank you Menno.

On Fri, Dec 12, 2014 at 12:01 AM, Menno Smits <menno.sm...@canonical.com> wrote:
> For the last day and a half I've been looking at this bug:
> https://bugs.launchpad.net/juju-core/+bug/1401130
>
> There's a lot of detail attached to the ticket but the short story is that
> the Joyent cloud often allocates different internal networks to instances,
> meaning that they can't communicate. From what I can tell from relevant LP
> tickets, this has been a problem for a long time (perhaps always). It's very
> hit and miss - sometimes you get allocated 10 machines in a row that all end
> up with the same internal network, but more often than not it only takes 2
> or 3 machine additions before running into one that can't talk to the
> others.

Your analysis explains a lot about the the intermittent failures we
have observed in Juju CI for months.
...

> Given that this is looking like a problem/feature at Joyent's end that needs
> clarification from them, may I suggest that this issue is no longer allowed
> to block CI?

Speaking for users, there is a regression.

We have extensively tested master (1.22), 1.21, and 1 20 this week in
Joyent. Master always fails, where as 1.21 and 1.20 pass, and are more
reliable than aws were we often see instances not available.

Juju 1.22 and joyent just don't work (even for small deployments). We
know that 1.22 must get the agent from the state-server, whereas 1.20
and 1.21 will get it from streams or a local container. After the
machine agent is started, it is calling home. Maybe the network
changes between the the time of cloud-init and starting the agent.
Maybe it doesn't change fast enough and we get an intermittent
failure.

As the for extensive testing. We have unlimted resources in joyent, so
Juju QA is using it test changes to industrial testing
(repeatability). Using 1.22 built last week, and 1.21 and 1.20, we say
high success rates, some times 100% for all jujus. We tested bundle
deployments with 1.20.14 which gave us 100% success.

We do see intermittent failures using 1.20 in the joyent cloud health
check. so we know statistically, the problem does exists for every
juju, but we are seeing 100% failure for master tip. The success rates
were better for master last week, and the rates for 1.20 and 1.21 are
great for all weeks.


-- 
Curtis Hovey
Canonical Cloud Development and Operations
http://launchpad.net/~sinzui

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to