For the last day and a half I've been looking at this bug:
https://bugs.launchpad.net/juju-core/+bug/1401130

There's a lot of detail attached to the ticket but the short story is that
the Joyent cloud often allocates different internal networks to instances,
meaning that they can't communicate. From what I can tell from relevant LP
tickets, this has been a problem for a long time (perhaps always). It's
very hit and miss - sometimes you get allocated 10 machines in a row that
all end up with the same internal network, but more often than not it only
takes 2 or 3 machine additions before running into one that can't talk to
the others.

I have found a forum post where someone from Joyent suggests adding a
static route for 10.0.0.0/8 to force all internal traffic down the internal
network interface. I've tried this out and it does indeed work. We *could*
have cloud-init install such a static route as new instances are configured
but that's a pretty gross hack that hardcodes an assumption in Juju about
Joyent's network setup which will no doubt bite us down the track.

Another possible workaround could be to have machines on Joyent communicate
via their public addresses, ignoring the internal network. I'm not sure how
hard this is.

Andrew has played around with the Joyent API and curiously the ListNetworks
API returns different networks to those that actually get assigned to the
instances. I hacked up the Joyent provisioner to use these networks but
that didn't seem to help.

I have opened a support ticket with Joyent to get clarification (no
response yet).

Given that this is looking like a problem/feature at Joyent's end that
needs clarification from them, may I suggest that this issue is no longer
allowed to block CI?

If there's other ideas about what's going on here, please speak up.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to