Re: Uniter test failure from utils update

2016-08-11 Thread William Reade
(or, at least, it's something like that -- the general point is that you
shouldn't throw away values you don't know you don't need)

On Thu, Aug 11, 2016 at 2:14 PM, William Reade <william.re...@canonical.com>
wrote:

> worker/uniter/operation/runcommands.go:68... uses `context.ErrReboot` to
> indicate "there is a valid response value that should be handled, but we
> also need to queue a reboot". The change throws away the valid response.
>
> On Thu, Aug 11, 2016 at 1:47 AM, Martin Packman <
> martin.pack...@canonical.com> wrote:
>
>> Nicholas was struggling today to land his snapcraft pull request:
>>
>> <https://github.com/juju/juju/pull/5956>
>>
>> As a side effect this updates some dependencies, including a couple of
>> changes to juju/utils, which it turns out, causes
>> UniterSuite.TestRebootFromJujuRun to fail.
>>
>> Here's a run of master, which passes:
>>
>> <http://paste.ubuntu.com/22963025/>
>>
>> Then the failure, just with utils updated to rev 8a9dea0:
>>
>> <http://paste.ubuntu.com/22965595/>
>>
>> I'm not clear on how the new exec behaviour is actually causing the
>> test failure:
>>
>> <https://github.com/juju/utils/pull/203/files>
>>
>> So, any insights or suggestions welcome.
>>
>>
>> Note, the log also includes two other errors, but they appear in both
>> the passing and failing log, so are not actually affecting the test
>> outcome:
>>
>> [LOG] 0:01.631 ERROR juju.apiserver Unable to prime
>> /tmp/check-1447535164350529303/5/logsink.log (proceeding anyway):
>> chown /tmp/check-1447535164350529303/5/logsink.log: operation not
>> permitted
>>
>> This code should probably just not be being run under unit test, is
>> assuming root privs?
>>
>> [LOG] 0:03.078 ERROR juju.worker.uniter.context updating agent status:
>> cannot set invalid status "rebooting"
>>
>> The status setting code in uniter is a twisty confusion, I don't know
>> why this is happening or where the error is being swallowed.
>>
>> Martin
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>> an/listinfo/juju-dev
>>
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Uniter test failure from utils update

2016-08-11 Thread William Reade
worker/uniter/operation/runcommands.go:68... uses `context.ErrReboot` to
indicate "there is a valid response value that should be handled, but we
also need to queue a reboot". The change throws away the valid response.

On Thu, Aug 11, 2016 at 1:47 AM, Martin Packman <
martin.pack...@canonical.com> wrote:

> Nicholas was struggling today to land his snapcraft pull request:
>
> 
>
> As a side effect this updates some dependencies, including a couple of
> changes to juju/utils, which it turns out, causes
> UniterSuite.TestRebootFromJujuRun to fail.
>
> Here's a run of master, which passes:
>
> 
>
> Then the failure, just with utils updated to rev 8a9dea0:
>
> 
>
> I'm not clear on how the new exec behaviour is actually causing the
> test failure:
>
> 
>
> So, any insights or suggestions welcome.
>
>
> Note, the log also includes two other errors, but they appear in both
> the passing and failing log, so are not actually affecting the test
> outcome:
>
> [LOG] 0:01.631 ERROR juju.apiserver Unable to prime
> /tmp/check-1447535164350529303/5/logsink.log (proceeding anyway):
> chown /tmp/check-1447535164350529303/5/logsink.log: operation not
> permitted
>
> This code should probably just not be being run under unit test, is
> assuming root privs?
>
> [LOG] 0:03.078 ERROR juju.worker.uniter.context updating agent status:
> cannot set invalid status "rebooting"
>
> The status setting code in uniter is a twisty confusion, I don't know
> why this is happening or where the error is being swallowed.
>
> Martin
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Let's talk retries

2016-08-09 Thread William Reade
On Tue, Aug 9, 2016 at 10:17 AM, roger peppe 
wrote:

> BTW I'm not saying that a timer queue is never the correct answer. In some
> circumstances, it can be the exactly the right thing to use.
>

Yeah -- the context here is that katco has been looking at the provisioner,
and I think a timer queue is sensible there. Firing off goroutines for
every machine is not necessarily bad by any means, but it *is* somewhat
harder to get right in every circumstance (e.g. what happens when a machine
is removed while that other goroutine is working on provisioning it?).

There are certainly many places where a single retryable op *is* the only
important thing and there's no call for a control loop. I'm not suggesting
we should always use a queue; but I'd tend to encourage it whenever we have
a component dealing with many similar retryable tasks. (I definitely should
be using it in dependency.Engine, for example.)

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: We recorded some thoughts on logging at the quartz sprint

2016-06-28 Thread William Reade
I'm not sure we should be sending any logging on stdout -- isn't that for
*results*, for consumption by other machines?

On Tue, Jun 28, 2016 at 11:52 AM, Reed O'Brien 
wrote:

> Relevant to the conversation currently happening at the core sprint, we
> recorded some thoughts on what to log at what level, when and where.
> Thoughts and examples appreciated.
>
>
> +---+---+---+-++
> | Level | Audience  | Purpose   | Example | I/O
> Stream |
>
> +---+---+---+-++
> | ERROR | End User  | Conveys information that requires | |
>  |
> |   |   | immediate action. Triggers call   | |
> STDERR   |
> |   |   | tree/pager duty.  | |
>  |
>
> +---+---+---+-++
> | WARN  | End User  | Conveys information that requires | |
>  |
> |   |   | user intervention soon to ensure  | |
> STDERR   |
> |   |   | continuos smooth operation.   | |
>  |
>
> +---+---+---+-++
> | INFO  | End User  | Conveys contextual "happy path"   | |
>  |
> |   |   | information about startup.| |
> STDOUT   |
>
> +---+---+---+-++
> | DEBUG | Developer | Conveys detailed function level   | |
>  |
> |   |   | information to aid in isolating   | |
> STDOUT   |
> |   |   | code paths during devleopement.   | |
>  |
>
> +---+---+---+-++
> | TRACE | Developer | Conveys detailed branch level | |
>  |
> |   |   | information to aid in isolating   | |
> STDOUT   |
> |   |   | code paths during development.| |
>  |
>
> +---+---+---+-++
>
> --
> Reed O'Brien
> ✉ reed.obr...@canonical.com
> ✆ 415-562-6797
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Machine agents uninstall themselves upon worker.ErrTerminateAgent.

2016-05-09 Thread William Reade
On Mon, May 9, 2016 at 11:03 AM, Andrew Wilkins <
andrew.wilk...@canonical.com> wrote:

Meta: a name like "uninstall-agent" is really misleading if it actually
>> means "machine-X is definitely dead". I never had the slightest indication
>> that was what it was meant to mean. But in that case, why *don't* we write
>> the file on certain API connection failures? There is logic that
>> *explicitly* checks if the machine is Dead -- surely that's a trigger too?
>>
>
> We do, but it looks like we've regressed since the MADE branch landed.
> api.ErrConnectImpossible now causes uninstall-agent, which means a
> (possibly temporary) authorization error will once again cause an uninstall:
>
> https://github.com/juju/juju/blob/master/worker/apicaller/connect.go#L176
>

Yeah, we discussed that at the time and evidently something went wrong -- I
was still misconceiving uninstall-agent as a manual-specific,
provision-time-only switch, which had been incorrectly overloaded with
is-it-dead-no-matter-the-provider meaning.


>  (1) record (in agent config?) that a machine is manual
>>>  (2) only ever do anything uninstall-related for manual machines
>>>  (3) only ever do uninstall-related things if the machine actually is
>>> Dead
>>>  (4) drop lxc-specific logic from uninstall *when LXC support is removed*
>>>
>>
>> Generally SGTM, but confused re (4) -- doesn't that have to be
>> fixed/moved/removed first earlier, so that we can switch off the suicide
>> behaviour in other cases without regressing?
>>
>
> We can remove the LXC/loop bits now if we're fine with leaking loop
> devices, which is probably OK assuming we are actually removing LXC before
> 2.0. Loop devices has to be explicitly enabled anyway.
>

Sounds like a plan. Agent config seems like a decent place to store
manual-ness; and having a specific ErrAgentEntityDead STM to be the best
signal for triggering a check and potential uninstall.

(in fact: for great safety, it can and probably should be
cmd/jujud/agent.errEntityDead, which gets set in a manifold- or
engine-config Filter field -- so it only ever gets injected in response to
specific errors from specific workers. It's unquestionably the agent's
decision to make; and it's the workers' responsibility to return precise
errors, rooted in their own contexts, that don't prejudge how the client
might want to respond.)

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Machine agents uninstall themselves upon worker.ErrTerminateAgent.

2016-05-09 Thread William Reade
On Mon, May 9, 2016 at 9:56 AM, Andrew Wilkins  wrote:

> The reason it's done at the last moment is to avoid having dangling
>> database entries. If we uninstall the agent (i.e. delete /var/lib/juju,
>> remove systemd/upstart), then if the agent fails before we get to
>> EnsureDead, then the entity will never be removed from state.
>>
>
> The *only* thing that should happen after setting dead is the uninstall --
> anything else that's required to happen before cleanup *must* happen before
> setting dead, which *means* "all my responsibilities are 100% fulfilled".
>

> I don't think I suggested above that we should do anything else other
than uninstall?

Sorry I misunderstood -- I was focused on the loop-device stuff that had
grown in the uninstall logic.


> The *only* justification for the post-death logic in the manual case is
>> because there's no responsible provisioner component to hand over to -- and
>> frankly I wish we'd just written that to SSH in and clean up, instead of
>> taking on this ongoing hassle.
>>
>
>>
> As an alternative, we could (should) only ever write the
>>> /var/lib/juju/uninstall-agent file from worker/machiner, first checking
>>> there's no assigned units, and no storage attached.
>>>
>>
>> Why would we *ever* want to write it at runtime? We know if it's a manual
>> machine at provisioning time, so we can write the File Of Death OAOO. All
>> the other mucking about with it is the source of these (serious!) bugs.
>>
>
> The point is not to distinguish between manual vs. non-manual. Yes, we can
> write something that records that fact OAOO.
>
> The point of "write from the machiner" was to signal that the machine is
> actually dead, and removed from state, vs. "my credentials are invalid,
> better shut down now".
>

Yeah, we definitely shouldn't return ErrTerminateAgent from apicaller on
mere invalid-credentials. (No worker should return ErrTerminateAgent *at
all*, really -- not their job. Having a couple of different workers that
can return ErrAgentEntityDead, which can then be interpreted by the next
layer? Fine :).)


> So we can write a file to confine uninstall to manual machines -- that
> much is easy, I don't think anyone will disagree with doing that. But we
> should not ignore the bug that prompted this thread, even if it's confined
> to manual machines.
>

Right; and, yeah, it's almost certainly better to leak manual machines than
it is to uninstall them accidentally -- so long as we know that's the
tradeoff we're making ;).


> Andrew, I think you had more detail last time we discussed this: is there
 anything else in uninstall (besides loop-device stuff) that needs to run
 *anywhere* except a manual machine? and, what will we actually need to sync
 with in the machiner? (or, do you have alternative ideas?)

>>>
>>> No, I don't think there is anything else to be done in uninstall, apart
>>> from loop detach and manual machine cleanup. I'm not sure about moving the
>>> uninstall logic to the machiner, for reasons described above. We could
>>> improve the current state of affairs, though, by only writing the
>>> uninstall-agent file from the machiner
>>>
>>
>> Strong -1 on moving uninstall logic: if it has to happen (which it does,
>> in *rare* cases that are *always* detectable pre-provisioning), uninstall
>> is where it should happen, post-machine-death; and also strong -1 on
>> writing uninstall-agent in *any* circumstances except manual machine
>> provisioning, we have had *way* too many problems with this "clever"
>> feature being invoked when it shouldn't be.
>>
>
> I don't want to belabour the point, but just to be clear: the
> uninstall-agent file exists to record the fact that he machine is in fact
> Dead, and uninstall should go ahead. That logic was put in specifically to
> prevent the referenced bug. We can and should improve it to only do this
> for manual machines.
>

Meta: a name like "uninstall-agent" is really misleading if it actually
means "machine-X is definitely dead". I never had the slightest indication
that was what it was meant to mean. But in that case, why *don't* we write
the file on certain API connection failures? There is logic that
*explicitly* checks if the machine is Dead -- surely that's a trigger too?

FWIW, the loop stuff can be dropped when the LXC container support is
>>> removed. Nobody ever added support for loop in the LXD provider, and I
>>> think we should implement support for it differently to how it was done for
>>> LXC anyway (losetup on host, expose to container; as opposed to expose all
>>> loop devices to all LXD containers and losetup in container).
>>>
>>
>> +1000 to that. So... can't we just (1) fix the manual provisioning to
>> write the file; (2) drop all other use of uninstall-agent; (3) drop the
>> lxc-specific logic in uninstall -- and then we're done?
>>
>
> For first steps, I think so. But I do think we should fix
> 

Re: Machine agents uninstall themselves upon worker.ErrTerminateAgent.

2016-05-09 Thread William Reade
On Mon, May 9, 2016 at 3:28 AM, Andrew Wilkins <andrew.wilk...@canonical.com
> wrote:

> On Sat, May 7, 2016 at 1:37 AM William Reade <william.re...@canonical.com>
> wrote:
>
>> On Fri, May 6, 2016 at 5:50 PM, Eric Snow <eric.s...@canonical.com>
>> wrote:
>>
>>> See https://bugs.launchpad.net/juju-core/+bug/1514874.
>>
>>
> So I think this issue is fixed in 2.0, but looks like the changes never
> got backported to 1.25. From your options, we do have (the opposite of) a
> DO_NOT_UNINSTALL file (it's actually called
> "/var/lib/juju/uninstall-agent"; only if it exists do we uninstall).
>
> (And now that I think of it, we're only writing uninstall-agent for the
> manual provider's bootstrap machine, and not other manual machines, so
> we're currently leaving Juju bits behind on manual machines added to an
> environment.)
>

Except we're *also* writing it on every machine, for Very Bad Reasons,
right? So we *are* still cleaning up all machines, but there's a latent
manual provider bug that'll need addressing.


> The reason it's done at the last moment is to avoid having dangling
> database entries. If we uninstall the agent (i.e. delete /var/lib/juju,
> remove systemd/upstart), then if the agent fails before we get to
> EnsureDead, then the entity will never be removed from state.
>

The *only* thing that should happen after setting dead is the uninstall --
anything else that's required to happen before cleanup *must* happen before
setting dead, which *means* "all my responsibilities are 100% fulfilled".
The *only* justification for the post-death logic in the manual case is
because there's no responsible provisioner component to hand over to -- and
frankly I wish we'd just written that to SSH in and clean up, instead of
taking on this ongoing hassle.

As an alternative, we could (should) only ever write the
> /var/lib/juju/uninstall-agent file from worker/machiner, first checking
> there's no assigned units, and no storage attached.
>

Why would we *ever* want to write it at runtime? We know if it's a manual
machine at provisioning time, so we can write the File Of Death OAOO. All
the other mucking about with it is the source of these (serious!) bugs.

Andrew, I think you had more detail last time we discussed this: is there
>> anything else in uninstall (besides loop-device stuff) that needs to run
>> *anywhere* except a manual machine? and, what will we actually need to sync
>> with in the machiner? (or, do you have alternative ideas?)
>>
>
> No, I don't think there is anything else to be done in uninstall, apart
> from loop detach and manual machine cleanup. I'm not sure about moving the
> uninstall logic to the machiner, for reasons described above. We could
> improve the current state of affairs, though, by only writing the
> uninstall-agent file from the machiner
>

Strong -1 on moving uninstall logic: if it has to happen (which it does, in
*rare* cases that are *always* detectable pre-provisioning), uninstall is
where it should happen, post-machine-death; and also strong -1 on writing
uninstall-agent in *any* circumstances except manual machine provisioning,
we have had *way* too many problems with this "clever" feature being
invoked when it shouldn't be.

FWIW, the loop stuff can be dropped when the LXC container support is
> removed. Nobody ever added support for loop in the LXD provider, and I
> think we should implement support for it differently to how it was done for
> LXC anyway (losetup on host, expose to container; as opposed to expose all
> loop devices to all LXD containers and losetup in container).
>

+1000 to that. So... can't we just (1) fix the manual provisioning to write
the file; (2) drop all other use of uninstall-agent; (3) drop the
lxc-specific logic in uninstall -- and then we're done?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Machine agents uninstall themselves upon worker.ErrTerminateAgent.

2016-05-06 Thread William Reade
On Fri, May 6, 2016 at 11:26 PM, Eric Snow <eric.s...@canonical.com> wrote:

> On Fri, May 6, 2016 at 11:37 AM, William Reade
> <william.re...@canonical.com> wrote:
> > On Fri, May 6, 2016 at 5:50 PM, Eric Snow <eric.s...@canonical.com>
> wrote:
> >>
> >> See https://bugs.launchpad.net/juju-core/+bug/1514874.
> >
> >
> > Mainly, we just shouldn't do it. The only *actual reason* for us to do
> this
> > is to support the manual provider; any other machine that happens to be
> dead
> > will be cleaned up by the responsible provisioner in good time. There's a
> > file we write to enable the suicidal behaviour when we enlist a manual
> > machine, and we *shouldn't* have ever written it for any other reason.
>
> So how about, other than the container bits, we drop the "uninstall"
> code entirely.  To address the manual provider case, during that
> provider's StartInstance() we add a "clean-up-juju" script somewhere
> on the machine?  Then folks can use that to confidently clean up after
> the fact.
>

I'm not keen to regress that without *very* compelling reasons to do so;
and a bit suspicious that installing a clean-up-juju script is itself going
to be a bit of a rabbit hole (once juju's upgraded, the things that need to
be cleaned up might change; at the very least that's an upgrade step to
maintain, but it's also way too easy to just forget that the stored script
could go out of date). Keeping it (as much as possible) in-process strikes
me as more generally reliable; the main thing is dialing way back on the
set of scenarios that can trigger it.

Yeah, if we simply did no automatic cleanup and (on manual provider)
> give users a script that would let them clean up afterward, then they
> could happily(?) manually repair their machines with much less
> trouble.
>

Still -1 myself: as you imply, I'm not sure anyone using this feature will
take on a per-machine manual step particularly "happily" ;).

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Machine agents uninstall themselves upon worker.ErrTerminateAgent.

2016-05-06 Thread William Reade
On Fri, May 6, 2016 at 5:50 PM, Eric Snow  wrote:

> See https://bugs.launchpad.net/juju-core/+bug/1514874.
>

Mainly, we just shouldn't do it. The only *actual reason* for us to do this
is to support the manual provider; any other machine that happens to be
dead will be cleaned up by the responsible provisioner in good time.
There's a file we write to enable the suicidal behaviour when we enlist a
manual machine, and we *shouldn't* have ever written it for any other
reason.

But then people started adding post-death cleanup logic to the agent, and
the only way to trigger that was to write that file; so they started
writing that file on normal shutdown paths, so that they could trigger the
post-death logic in all cases, and I can only assume they decided that
nuking the installation was acceptable precisely *because* the provisioner
would be taking down the machine anyway. And note that because there *is* a
responsible provisioner that will be taking the machine down, that shutdown
logic might or might not happen, rendering it ultimately pretty unreliable
[0] as well as spectacularly terrible in the lp:1514874 cases. /sigh

So, narrowly, fixing this involves relocating the more-widely-useful
(in-container loop device detach IIRC?) code inside MachineAgent.uninstall;
and then just not ever writing the file that triggers actual uninstall,
putting it back to where it was intended: a nasty but constrained hack to
avoid polluting manual machines (which are only "borrowed") any more than
necessary.

(A bit more broadly: ErrTerminateAgent is kinda dumb, but not
*particularly* harmful in practice [1] without the file of death backing it
up. The various ways in which, e.g. api connection failure can trigger it
are a bit enthusiastic, perhaps, but if *all it did was terminate the
agent* it'd be fine: someone who, e.g. changed their agent.conf's password
would invoke jujud and see it stop -- can't do anything with this config --
but be perfectly able to work again if the conf were fixed. Don't know what
the deal is with the correct-password-refused thing, though: that should be
investigated further.)

The tricky bit is relocating the cleanup code in uninstall -- I suspect the
reason this whole sorry saga kicked off is because whoever added it was in
a hurry and thought they didn't have a place to put this logic (inside the
machiner, *before* setting Dead, would be closest to correct -- but that'd
require us to synchronise with any other workers that might use the
resources we want to clean up, and I suspect that would have been the
roadblock). Andrew, I think you had more detail last time we discussed
this: is there anything else in uninstall (besides loop-device stuff) that
needs to run *anywhere* except a manual machine? and, what will we actually
need to sync with in the machiner? (or, do you have alternative ideas?)

Cheers
William

[0] although with the watcher resolution of 5s, it's actually got a pretty
good chance of running.
[1] still utterly terrible in theory, it would be lovely to see explicit
handover between contexts -- e.g. it is *not* the machiner's job to return
ErrTerminateAgent, it is whatever's-responsible-for-the-machiner's job to
handle a local ErrMachineDead and convert that as necessary for the context
it's running in.

PS Oh, uh, couple of comments on the below:

1. do nothing
> This would be easy :) but does not help with the pain point.
> 2. be smarter about retrying (e.g. retry connecting to API) when
> running into fatal errors
> This would probably be good to do but the effort might not pay for
> itself.
> 3. do not clean up (data dir, init service, or either)
> Leaving the init service installed really isn't an option because
> we don't want
> the init service to try restarting the agent over and over.
> Leaving the data dir
> in place isn't good because it will conflict with any new agent
> dir the controller
> tries to put on the instance in the future.
>

we don't need to worry about the init service because we're set up not to
be restarted if we return 0, which we should on ErrTerminateAgent.


> 4. add a DO_NOT_UNINSTALL or DO_NOT_CLEAN_UP flag (e.g. in the
> machine/model config or as a sentinel file on instances) and do not
> clean up if it is set to true (default?)
> This would provide a reasonable quick fix for the above bug, even if
> temporary and even if it defaults to false.
> 5. disable (instead of uninstall) the init services and move the data
> dir to some obvious but out-of-the-way place (e.g.
> /home/ubuntu/agent-data-dir-XXMODELUUIDXX-machine-X) instead of
> deleting it
> This is a reasonable longer term solution as the concerns
> described for 3 are
> addressed and manual intervention becomes more feasible.
> 6. same as 5 but also provide an "enable-juju" (or "revive-juju",
> "repair-juju", etc.) command *on the machine* that would re-enable the
> init services and restore (or rebuild) the agent's data 

Re: Move provider implementations under their related projects.

2016-03-25 Thread William Reade
On Thu, Mar 24, 2016 at 10:12 PM, Eric Snow  wrote:

> Perhaps we should move the implementation of some providers over to
> the repos for the projects with which the providers interact (and then
> just import them in providers/all).  Then those providers would be
> more likely to stay up-to-date in the face of changes in the project,
> particularly backward-incompatible ones.  For example:
>
> * provider/maas -> github.com/juju/gomaasapi/maasprovider
> * provider/lxd -> github.com/lxc/lxd/lxdprovider
> * ...
>
> or something like that.  It's not a perfect solution nor a complete
> one, but it should help.
>

Is the intent to make it hard to change the library project without
updating the provider? I understand the impulse, but ISTM that this will
make it harder to work on those projects and not actually lead to any
better outcomes for juju. Ultimately, isn't it our job to keep up with
changes to the libraries we depend upon?

Cheers
William


>
> -eric
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Usability issues with status-history

2016-03-20 Thread William Reade
On Sun, Mar 20, 2016 at 2:51 AM, Andrew Wilkins <
andrew.wilk...@canonical.com> wrote:

> +1 on collapsing repeats. I'd also prefer to add more data to status so
> that we can collapse entries, rather than dropping data so that we
> don't/can't see it in history.
>
> What do we base "sensible filtering" on? Exact match on message isn't
> enough for download messages, obviously, and I'd be very hesitant to bake
> in any knowledge of specific message formats.
>

Yeah, message formats would be a pretty poor basis for similarity.

IIANM, our status entries can carry additional data which we don't render.
> If we add the concept of overarching operations to status entries (e.g.
> each "image download progress" entry is part of the "image download"
> operation), then we could collapse all adjacent entries within that
> operation. This could be a simple string in the status data; or we could
> extend the status schema. Either way.
>

YANM :). I'm pretty sure that adding a value to status data is the easiest
approach; and, given that we can't change all the code at once, I think
it's also better to have the maybe-present value explicitly in the
status-data-bag rather than part of the schema (and hence subtly implied to
be more reliable than it actually is). Thoughts?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Usability issues with status-history

2016-03-19 Thread William Reade
On Sat, Mar 19, 2016 at 4:39 AM, Ian Booth  wrote:

>
> I mostly agree but still believe there's a case for transient messages.
> The case
> where Juju is downloading an image and emits progress updates which go into
> status history is to me clearly a case where we needn't persist every
> single one
> (or any). In that case, it's not a charmer deciding but Juju. And with
> status
> updates like X% complete, as soon as a new message arrives, the old one is
> superseded anyway. The user is surely just interested to know the current
> status
> and when it completes they don't care anymore. And Juju agent can still
> decide
> to say make every 10% of download progress messages non-transient to they
> go to
> history for future reference.
>

There are two distinct problems: collecting the data, and presenting
information gleaned from that data. Adding complexity to the first task in
the hope of simplifying the second mixes the concerns at a very deep level,
and makes the whole stack harder to understand for everybody.

Would this work s an initial improvement for 2.0:
>
> 1. Increase limit of stored messages per entity so say 500 (from 100)
>

Messages-per-entity seems like a strange starting point, compared with
either max age or max data size (or both). Storage concerns don't seem like
a major risk: we're keeping a max 3 days/4 gigabytes of normal log messages
in the database already, and I rather doubt that SetStatus calls generate
anything like that magnitude of data. Shouldn't we just be following the
same sort of trimming strategy there and leaving the dataset otherwise
uncontaminated, and hence as useful as possible?

2. Allow messages emitted from Juju to be marked as transient
> eg for download progress
>

-1, it's just extra complexity to special-case a particular kind of status
in exchange for very questionable storage gains, and muddies the dataset to
boot.


> 3. Do smarter filtering of what is displayed with status-history
> eg if we see the same tuple of messages over and over, consolidate
>
> TIMETYPESTATUS  MESSAGE
> 26 Dec 2015 13:51:59Z   agent   executing   running config-changed hook
> 26 Dec 2015 13:51:59Z   agent   idle
> 26 Dec 2015 13:56:57Z   agent   executing   running update-status hook
> 26 Dec 2015 13:56:59Z   agent   idle
> 26 Dec 2015 14:01:57Z   agent   executing   running update-status hook
> 26 Dec 2015 14:01:59Z   agent   idle
> 26 Dec 2015 14:01:57Z   agent   executing   running update-status hook
> 26 Dec 2015 14:01:59Z   agent   idle
>
> becomes
>
> TIME TYPE STATUS MESSAGE
> 26 Dec 2015 13:51:59Z agent executing running config-changed hook
> 26 Dec 2015 13:51:59Z agent idle
> >> Repeated 3 times, last occurence:
> 26 Dec 2015 14:01:57Z agent executing running update-status hook
> 26 Dec 2015 14:01:59Z agent idle
>

+100 to this sort of thing. It won't be perfect, but where it's imperfect
we'll be able to see how to improve. And if we're always calculating it
from the source data, we can improve the presentation/analytics and fix
those bugs in isolation; if we mangle the data at collection time we
sharply limit our options in that arena. (And surely sensible filtering
will render the transient-download-message problem moot *anyway*, leaving
us less reason to worry about (2)?)

Cheers
William



>
>
>
>
> > On Thu, Mar 17, 2016 at 6:30 AM, John Meinel 
> wrote:
> >
> >>
> >>
> >> On Thu, Mar 17, 2016 at 8:41 AM, Ian Booth 
> >> wrote:
> >>
> >>>
> >>> Machines, services and units all now support recording status history.
> Two
> >>> issues have come up:
> >>>
> >>> 1. https://bugs.launchpad.net/juju-core/+bug/1530840
> >>>
> >>> For units, especially in steady state, status history is spammed with
> >>> update-status hook invocations which can obscure the hooks we really
> care
> >>> about
> >>>
> >>> 2. https://bugs.launchpad.net/juju-core/+bug/1557918
> >>>
> >>> We now have the concept of recording a machine provisioning status.
> This
> >>> is
> >>> great because it gives observability to what is happening as a node is
> >>> being
> >>> allocated in the cloud. With LXD, this feature has been used to give
> >>> visibility
> >>> to progress of the image downloads (finally, yay). But what happens is
> >>> that the
> >>> machine status history gets filled with lots of "Downloading x%" type
> >>> messages.
> >>>
> >>> We have a pruner which caps the history to 100 entries per entity. But
> we
> >>> need a
> >>> way to deal with the spam, and what is displayed when the user asks for
> >>> juju
> >>> status-history.
> >>>
> >>> Options to solve bug 1
> >>>
> >>> A.
> >>> Filter out duplicate status entries when presenting to the user. eg say
> >>> "update-status (x43)". This still allows the circular buffer for that
> >>> entity to
> >>> fill with "spam" though. We could make the circular buffer size much
> >>> larger. But
> >>> there's still the issue of 

Re: Usability issues with status-history

2016-03-19 Thread William Reade
I see this as a combination of two problems:

1) We're spamming the end user with "whatever's in the status-history
collection" rather than presenting a digest tuned for their needs.
2) Important messages get thrown away way too early, because we don't know
which messages are important.

I think the pocket/transient/expiry solutions boil down to "let's make the
charmer decide what's important", and I don't think that will help. The
charmer is only sending those messages *because she believes they're
important*; even if we had "perfect" trimming heuristics for the end user,
we do the *charmer* a disservice by leaving them no record of what their
charm actually did.

And, more generally: *every* message we throw away makes it hard to
correctly analyse any older message. This applies within a given entity's
domain, but also across entities: if you're trying to understand the
interactions between 2 units, but one of those units is generating many
more messages, you'll have 200 messages to inspect; but the 100 for the
faster unit will only cover (say) the last 30 for the slower one, leaving
70 slow-unit messages that can't be correlated with the other unit's
actions. At best, those messages are redundant; at worst, they're actively
misleading.

So: I do not believe that any approach that can be summed up as "let's
throw away *more* messages" is going to help either. We need to fix (2) so
that we have raw status data that extends reasonably far back in time; and
then we need to fix (1) so that we usefully precis that data for the user
(...and! leave a path that makes the raw data observable, for the cases
where our heuristics are unhelpful).

Cheers
William

PS re: UX of asking for N entries... I can see end-user stories for
timespans, and for "the last N *significant* changes". What's the scenario
where a user wants to see exactly 50 message atoms?

On Thu, Mar 17, 2016 at 6:30 AM, John Meinel  wrote:

>
>
> On Thu, Mar 17, 2016 at 8:41 AM, Ian Booth 
> wrote:
>
>>
>> Machines, services and units all now support recording status history. Two
>> issues have come up:
>>
>> 1. https://bugs.launchpad.net/juju-core/+bug/1530840
>>
>> For units, especially in steady state, status history is spammed with
>> update-status hook invocations which can obscure the hooks we really care
>> about
>>
>> 2. https://bugs.launchpad.net/juju-core/+bug/1557918
>>
>> We now have the concept of recording a machine provisioning status. This
>> is
>> great because it gives observability to what is happening as a node is
>> being
>> allocated in the cloud. With LXD, this feature has been used to give
>> visibility
>> to progress of the image downloads (finally, yay). But what happens is
>> that the
>> machine status history gets filled with lots of "Downloading x%" type
>> messages.
>>
>> We have a pruner which caps the history to 100 entries per entity. But we
>> need a
>> way to deal with the spam, and what is displayed when the user asks for
>> juju
>> status-history.
>>
>> Options to solve bug 1
>>
>> A.
>> Filter out duplicate status entries when presenting to the user. eg say
>> "update-status (x43)". This still allows the circular buffer for that
>> entity to
>> fill with "spam" though. We could make the circular buffer size much
>> larger. But
>> there's still the issue of UX where a user ask for the X most recent
>> entries.
>> What do we give them? The X most recent de-duped entries?
>>
>> B.
>> If the we go to record history and the current previous entry is the same
>> as
>> what we are about to record, just update the timestamp. For update
>> status, my
>> view is we don't really care how many times the hook was run, but rather
>> when
>> was the last time it ran.
>>
>
> The problem is that it isn't the same as the "last" message. Going to the
> original paste:
>
> TIMETYPESTATUS  MESSAGE
> 26 Dec 2015 13:51:59Z   agent   idle
> 26 Dec 2015 13:56:57Z   agent   executing   running update-status hook
> 26 Dec 2015 13:56:59Z   agent   idle
> 26 Dec 2015 14:01:57Z   agent   executing   running update-status hook
> 26 Dec 2015 14:01:59Z   agent   idle
>
> Which means there is an "running update-status" *and* a "idle" message.
> So we can't just say "is the last message == this message". It would have
> to look deeper in history, and how deep should we be looking? what happens
> if a given charm does one more "status-set" during its update-status hook
> to set the status of the unit to "still happy". Then we would have 3.
> (agent executing, unit happy, agent idle)
>
>
>> Options to solve bug 2
>>
>> A.
>> Allow a flag when setting status to say "this status value is transient"
>> and so
>> it is recorded in status but not logged in history.
>>
>> B.
>> Do not record machine provisioning status in history. It could be argued
>> this
>> info is more or less transient and once the machine comes up, we don't
>> care so
>> much about it anymore. 

Re: Tags and object IDs

2016-01-25 Thread William Reade
On Mon, Jan 25, 2016 at 12:07 PM, Nate Finch 
wrote:

> I was really trying not to give too much information about this exact
> case, so we could avoid talking about a specific implementation, and focus
> on the more general question of how we identify objects.  Yes, we get the
> bytes using an HTTP request, but that is irrelevant to my question :)
>

I thought I did answer the question:

But whenever we do record the unit-X-uses-resource-Y info I assume we'll
>>> have much the same stuff available in the apiserver, in which case I think
>>> you just want to pass the *Unit back into state; without it, you just need
>>> to read the doc from the DB all over again to make appropriate
>>> liveness/existence checks [0], and why bother unless you've already hit an
>>> assertion failure in your first txn attempt?
>>>
>>
...but perhaps I misunderstood what you were looking for?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Defaulting tests internal to the package

2016-01-23 Thread William Reade
On Sat, Jan 23, 2016 at 12:27 AM, David Cheney <david.che...@canonical.com>
wrote:

> I agree with Nate, using external tests just adds more boilerplate.
>

I agree that it's often *harder* to write external tests; but it's always
harder to write code that's reliable (and, specifically, here, resilient in
the face of distracted/hurried maintenance programming, which is a
perennial problem).

But I'd love to be shown where I'm wrong. Do you have any counterpoints to
my statements about the practical impact of internal tests?

Cheers
William


> On Sat, Jan 23, 2016 at 7:23 AM, William Reade
> <william.re...@canonical.com> wrote:
> > On Fri, Jan 22, 2016 at 5:17 PM, Nate Finch <nate.fi...@canonical.com>
> > wrote:
> >>
> >> I'm glad to hear Roger's opinion about testing internal code... that's
> >> exactly how I feel.  True unit tests of small bits of code are easy to
> >> write, easy to read and understand, and give you confidence that your
> code
> >> is doing what you think it'll do in a wide variety of cases.  If the
> unit
> >> test fails, it's generally incredibly clear where the fault in the code
> >> lies, and it's easy to fix either the code or the test.  In 6 months,
> you or
> >> someone else can go back to the tests, easily understand what they are
> >> testing, and modify them trivially when making a bug fix or improvement.
> >
> >
> > Tests for unexported utility functions can give you great confidence in
> the
> > operation of those unexported utility functions... but those tests will
> > continue to pass even if you change the package so that they're never
> > actually used, and that renders those tests' diagnostic power pretty much
> > worthless. And, to make things worse, that sort of test is very
> frequently
> > used as a justification for skipping tests of the same functionality
> against
> > the exported interface.
> >
> >>
> >> While you certainly *can* write code that tests all the corner cases of
> >> some small utility function through the external API... those tests will
> >> almost always be incredibly opaque and hard to understand, and generally
> >> much more complicated than they would be if you could just test the
> utility
> >> function by itself.  This is often a problem I have with our current
> >> tests... it's hard to see what is actually being tested, and even
> harder to
> >> verify that the test itself is correct and complete.  When a test
> fails, you
> >> often have no clue where the actual problem lies, because the test
> traverses
> >> so much code.
> >
> >
> > I think you're misdiagnosing the problems with our current tests. Most of
> > our rubbish "unit" tests are rubbish because they're not unit tests at
> all:
> > they're full-stack tests that are utterly and inextricably bound up with
> the
> > implementation details of *all* the layers underneath, and depend
> critically
> > upon assumptions about how those other layers work. And, yes, one of the
> > worst things about those tests is how hard it is to diagnose what's
> actually
> > happened when they fail.
> >
> > But that's really *not* a good reason to add *more* implicit assumptions
> > about how various components are connected together, which is exactly
> what
> > you're doing when you write tests for unexported implementation details.
> If
> > your utility function tests fail, but the package tests don't, all it
> means
> > is that you've written bad package tests; or, if your package tests *are*
> > complete and correct, an isolated utility-test failure just means you've
> > wasted time writing tests for functionality that delivers no observable
> > value.
> >
> >> Of course, I definitely think you *also* need tests of the exported API
> of
> >> a package... but small unit tests are still incredibly valuable.
> >
> >
> > "Small unit tests" are tremendously valuable, I agree. And if your tests
> for
> > the exported API are *not* small unit tests, you're Doing It Wrong. But
> unit
> > tests for unexported functions are *by definition* tests for
> implementation
> > details, not for behaviour, and as such have *negative* value; and not
> just
> > because they fossilise the code, or even just because they give you false
> > confidence, but because they remove any pressure to export a good
> interface.
> >
> > To put it another way: if it's hard to write tests to exercise X internal
> > functionality, either that functionality is not important... or, wors

Re: Tags and object IDs

2016-01-22 Thread William Reade
On Fri, Jan 22, 2016 at 10:28 PM, William Reade <william.re...@canonical.com
> wrote:
>
> [0] I imagine you're not just dumping (unit, resource) pairs into the DB
> without checking that they're sane? that's really not safe
>

(Specifically, you need to check that they're *still* sane whenever that
txn happens to apply. Just because they were sane at the time of the api
call is no indication of anything.)
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Tags and object IDs

2016-01-22 Thread William Reade
Need a bit more context here. What layer are you working in?

In general terms, entity references in the API *must* use tags; entity
references that leak out to users *must not* use tags; otherwise it's a
matter of judgment and convenience. In state code, it's annoying to use
tags because we've already got the globalKey convention; in worker code
it's often justifiable if not exactly awesome. See
https://github.com/juju/juju/wiki/Managing-complexity#workers

Cheers
William

On Fri, Jan 22, 2016 at 6:02 PM, Nate Finch 
wrote:

> I have a function that is recording which unit is using a specific
> resource.  I wrote the function to take a UnitTag, because that's the
> closest thing we have to an ID type. However, I and others seem to remember
> hearing that Tags are really only supposed to be used for the API. That
> leaves me with a problem - what can I pass to this function to indicate
> which unit I'm talking about?  I'd be fine passing a pointer to the unit
> object itself, but we're trying to avoid direct dependencies on state.
> People have suggested just passing a string (presumably
> unit.Tag().String()), but then my API is too lenient - it appears to say
> "give me any string you want for an id", but what it really means is "give
> me a serialized UnitTag".
>
> I think most places in the code just use a string for an ID, but this
> opens up the code to abuses and developer errors.
>
> Can someone explain why tags should only be used in the API? It seems like
> the perfect type to pass around to indicate the ID of a specific object.
>
> -Nate
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Defaulting tests internal to the package

2016-01-22 Thread William Reade
On Thu, Jan 21, 2016 at 9:55 PM, Nate Finch 
wrote:

> [reposting this to the wider juju-dev mailing list]
>
> So, I hit an interesting problem a while back.  I have some unit tests
> that need to be internal tests, thus they are in 'package foo'.  However,
> my code needs to use some testhelper functions that someone else wrote...
> which are in 'package foo_test'.  It's impossible to reference those, so I
> had to move those helpers to package foo, which then either requires that
> we make them exported (which is exactly like using export_test.go, which,
> in the juju-core Oakland sprint, we all agreed was bad), or all tests that
> use the helpers need to be in 'package foo'... which means I had to go
> change a bunch of files to be in package foo, and change the calls in those
> files from foo.SomeFunc() to just SomeFunc().
>
> Given the assumption that some tests at some point will make sense to be
> internal tests, and given it's likely that helper functions/types will want
> to be shared across suites - should we not just always make our tests in
> package foo, and avoid this whole mess in the first place?
>

This is a poor justification. The only reason to write internal tests is
when you *really can't* write those tests against the exported interface;
and the only legitimate reason to accept that situation is when you're
dealing with terrible legacy code and *really don't* have the time to
rewrite a whole package properly.

(A note, this happened again today - I wanted to add a unit test of a
> non-exported function to an existing test suite, and can't because the unit
> tests are in the foo_test package)
>

You're doing it wrong. Either export the system you want to test (and
inject it explicitly into clients) or write a fixture for the exported bit
that lets you demonstrate that the system is hooked up. Anything else is
mere cargo-culting and grants neither the verification nor the design
benefits that testing should deliver.

There seems to only be two concrete benefits to putting tests in package
> foo_test:
> 1. It avoids circular dependencies if your test needs to import something
> that imports the package you're testing.
> 2. It makes your tests read like normal usages of your package, i.e.
> calling foo.SomeFunc().
> The first is obviously non-negotiable when it comes up... but I think it's
> actually quite rare (and might indicate a mixture of concerns that warrants
> investigation).  The second is nice to have, but not really essential (if
> we want tests that are good examples, we can write example functions).
>

There are two main reasons to write tests:

1) Tests let you verify how the code you run in production will actually
act.
2) Tests are a second client for the interface you create, and thus apply
early pressure towards good/sane interface design.

Writing package-internal tests subverts both goals, and gives us a *tiny*
bit of short-term convenience in return. It's a terrible trade-off.

So, I propose we put all tests in package foo by default.  For those devs
> that want to only test the exported API of a package, you can still do
> that.  But this prevents problems where helper code can't be shared between
> the two packages without ugliness and/or dumb code churnm, and it means
> anyone can add unit tests for non-exported code without having to create a
> whole new file and testsuite.
>

I do not want anyone to add unit tests for non-exported code, because those
tests are almost completely worthless. It's not a matter of personal
preference; it's a matter of simple correctness. Keep your tests in
separate packages.

Cheers
William



>
> -Nate
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread William Reade
On Wed, Jan 20, 2016 at 2:42 PM, Dean Henrichsmeyer 
wrote:

> Hi,
>
> It seems the original point James was making is getting missed. No one is
> arguing over the value of being able to retry and/or idempotent hooks.
> Yes, you should be able to retry them and yes nothing should break if you
> run them over and over.
>
> The point made is that Juju shouldn't be automatically retrying them. The
> argument of "no one knows what went wrong so Juju automatically retrying
> them is a better experience" doesn't work. The intelligence of the stack in
> question, regardless of what it is, goes in the charms. If you start
> conflating and mixing up where the intelligence goes then creating,
> running, and debugging those distributed systems will be a nightmare.
>

Hook errors *will* happen, and often for transient reasons. In handling
this, we can choose between "users retry without understanding the details"
and "juju retries without understanding the details" [0]. I'd be happy to
make the behaviour configurable, for the rare cases when the user *does*
understand the details and wants full and detailed control, but I don't
think that's the common case.

The magic should only be in Juju's ability to effectively drive the models
> and intelligence encoded in the charms. It shouldn't make assumptions about
> what that intelligence is or what those models require.
>

Stopping on hook error can only *prevent* those charms from applying their
intelligence. No more hooks to be run => no more opportunity to react. If a
charm wants to be smart about errors, it needs to detect the errors it
*knows* about, and react to those by setting status; and to move on
*without* failing the hook, thereby giving subsequent hooks an opportunity
to be smart.

Ultimately, it comes down to the fact that there's *always* another error
case you haven't considered. If you depend on the charmer to implement
retries for specific errors, that's essentially a whitelist, and they're
stuck playing whack-a-mole forever [1]. But if the charmer can depend on
external retries, they only have to worry about maintaining a
definitely-fatal blacklist and reporting those conditions in status.

Am I making any sense here?

Cheers
William


[0] or "the system stays broken forever", I suppose :).
[1] I imagine the rational approach there is to give up, and start
whitelisting by operation rather than by error; i.e. to accept that most
errors are unknown/transient and should be dumbly retried. And given that,
why should every charmer have to roll their own retries?
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread William Reade
On Wed, Jan 20, 2016 at 8:01 PM, Dean Henrichsmeyer 
wrote:
>
> You realize James was complaining and not celebrating the "success" ? The
> fact that we can have a discussion trying to determine whether something is
> a bug or a feature indicates a problem.
>

Sorry, I didn't intend to disparage his experience; I took it as legitimate
and reasonable surprise at a change we evidently didn't communicate
adequately. But I don't think it's a misfeature; I think it's a necessary
approach, in service of global reliability in challenging environments.

But: if there are times it's inconvenient and not just surprising, we
should surely be able to disable it. Gabriel/Bogdan, would you be able to
address this?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Wish: configurable juju's write concern when connecting to mongd

2016-01-12 Thread William Reade
On Mon, Jan 11, 2016 at 5:14 PM, Mario Splivalo <
mario.spliv...@canonical.com> wrote:

> Hello.
>
> I have a customer that ended up with broken juju database - the issue
> and possible cause is explained here: http://pad.lv/1528261
>
> I don't have enough data to verify what exactly happened, but most
> likely something like this happened:
>
> 1. jujud wrote to PRIMARY, with write concern of 'majority'
> 2. as SECONDARYs are lagged (usually the customers run them inside VMs),
> only when changes replicated to ONE of the SECONDARYes, mongod returned
> 'all good, carry on' to jujud
> 3. PRIMARY lost connectivity with the rest of the replicaset.
> 4. SECONDARYs decided to vote for the new PRIMARY - the SECONDAY which
> haven't had all the data replicated to it was choosen as new PRIMARY.
>

This specific scenario was once a problem but should be fine in 2.4.6 [0].


> 5. former-PRIMARY joins the replicaset, and destroys (rollbacks) all of
> the unreplicated changes
>
> And now we have a situation that juju thinks that the data is written to
> the database, where in fact that data doesn't exists.
>

There is, however, still a mechanism whereby parts of juju can think a
value has been written when it actually hasn't [1].

Now, if we could tell juju to use 'writeconcern' of 3, situation like
> above wouldn't happen, as we are always sure that all the data changes
> are written to all of the servers (assuming, of course, we run
> replicaset with three nodes).
> In the event of one server going down, writes to mongodb would stop,


This would make it hard to be HA...


> as
> there are now only two servers to write to, and we are asking mongo to
> confirm writes to three servers. But, we are safe, data-wise, and no
> data will be lost.
>

...and I don't *think* we'd fully address the read-uncommitted problem even
if we could work around the loss of HA mongo.

With the option to rec-configure jujud to use write-concern of 2, we
> could re-enable writes to the mongod, at least until we bring back
> broken SECONDARY back to life.
>
> Does this makes any sense?
>

I see where you're coming from, but I don't think it'd help. AFAICS the
problem isn't that writes are acked too early; it's that concurrent
flushers can read uncommitted transaction data; and record references to
those transactions; which could then end up dangling.

I'm asking this because in real-life deployments situation as described
> above is not so uncommon. Especially when jujud state servers (and their
> mongodb nodes) are inside VMs, there is a possibility that the slaves
> would lag behind the master.
>

In practice, it seems that the bad behaviour is triggered by having "too
many" concurrent txns touching the same document. The assert-address txns
referenced in the bug were particularly serious offenders there, and have
long since been fixed; but all the same, I'd like to remind all developers
who need to touch state that their txn footprints should be as dainty as
possible. This is not the first time that enthusiastically-overlapping
transactions have given us grief.

Cheers
William

[0] https://aphyr.com/posts/284-jepsen-mongodb#comment-602
[1] https://aphyr.com/posts/322-jepsen-mongodb-stale-reads



> Mario
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Be very careful in assumptions in tests

2015-12-04 Thread William Reade
Indeed, and to expand on all that: test-first or test-last, it's smart to
purposely break your code and make sure that your test *fails* semi-cleanly
(rather than, e.g., deadlocking).

On Fri, Dec 4, 2015 at 3:32 AM, Nate Finch  wrote:

> I'll definitely +1 the need for gc.HasLen ... I've seen a ton of panics in
> tests if the code starts erroneously returning nil slices.  Obviously this
> is less bad, since the tests still fail, but they're really ugly annoying
> failures.
>
> And +1000 to making tests that fail before fixing the code (or at least,
> purposely re-breaking the code if you forget and fix the bug first)... I've
> caught so many invalid tests that way... it is the only way to prove to
> yourself that your test actually work correctly both in proving correct
> behavior, and signalling bad behavior.
>
> On Thu, Dec 3, 2015 at 5:24 PM Tim Penhey 
> wrote:
>
>> Hi folks,
>>
>> I came across an interesting bug yesterday and during investigation
>> found that there was a very comprehensive test that covered the situation.
>>
>> The problem is that the asserts were not actually running.
>>
>> The asserts were inside a loop where the expectation was that the loop
>> would run exactly once as a previous call returned an array of items
>> with one value in it.
>>
>> Except it didn't.
>>
>> It returned an empty list.
>>
>> So, if you see something like this, a simple line like:
>>
>>   c.Assert(someList, gc.HasLen, 1)
>>
>> Is sufficient to have caught this particular issue.
>>
>>
>> Remember a key step when writing tests is to have the test fail first,
>> then make it pass.
>>
>> One trick I use is to assert checks against data I know is wrong. If the
>> test passes then I know I have other problems.
>>
>> Tim
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: apiserver authorizers

2015-12-03 Thread William Reade
(it's not quite done, but that's in the nature of wikis, and I have to EOD
sharpish today...)

On Thu, Dec 3, 2015 at 5:54 PM, William Reade <william.re...@canonical.com>
wrote:

> Thanks for asking :). Not sure there's a document directly focused on it,
> largely because it's relatively simple -- doc/design/
> juju-api-design-specification.md just says:
>
> > The authorizer represents a value that can be asked for authorization
> > information on its associated authenticated entity. It allows an API
> > implementation to ask questions about the client that is currently
> connected.
>
> ...which does tell you the important thing; and the interface is
> documented reasonably clearly in `apiserver/common/interfaces.go`; for
> example:
>
> > // AuthMachineAgent returns whether the authenticated entity is a
> > // machine agent.
> > AuthMachineAgent() bool
>
> Honestly, I was expecting it to need to change more than it apparently has
> -- it actually seems to have covered our needs pretty well. Anyway, in a
> bit more detail:
>
>   * Your facade constructor is passed a common.Authorizer that you can
> interrogate to discover properties of the authenticated entity at the other
> end of the connection.
>   * You should definitely use it to restrict access to the facade as a
> whole, so that (e.g.) machine agents can't talk to external facades, and
> normal machine agents can't talk to environ-manager-only facades, etc etc.
>   * You should probably use it internally as well, to check the actual
> arguments refer to entities your worker should be allowed to see.
> (sometimes you don't need to; judgment call; but err in favour of checking
> if you're at all unsure whether unrestricted access is a good idea for the
> client in question.)
>   * Those restrictions should be as tight as possible: e.g. if it's a
> facade for a per-machine-agent worker, it almost certainly shouldn't be
> allowed to query or modify other machines, etc etc.
>   * To restate: don't worry about restricting potentially valid use cases,
> worry about leaking information or allowing operations that could be abused
> by buggy or malicious code.
>
> Regardless, I've just written a wiki page which attempts to illustrate
> basic good practice: https://github.com/juju/juju/wiki/API-best-practices
>
> Please let me know if it's useful and/or can/should be improved :-).
>
> Cheers
> William
>
>
> On Wed, Dec 2, 2015 at 2:54 PM, Nate Finch <nate.fi...@canonical.com>
> wrote:
>
>> Auth on the api is a big mystery to me. Is there a document on the
>> expected behavior and the functions and types that back it up?
>>
>> For example, you said "an environ provisioner may have wide-ranging
>> powers, but that's no reason to let it see or modify container machines
>> that are not its direct responsibility" ... I don't really know what that
>> means. This sounds like authorization, but how do you know who is calling
>> your api or what they're supposed to be allowed to do?
>>
>> On Wed, Dec 2, 2015, 6:36 AM William Reade <william.re...@canonical.com>
>> wrote:
>>
>>> The case I spotted yesterday was here:
>>> http://reviews.vapour.ws/r/3243/diff/1/?file=167369#file167369line41
>>>
>>> and in general:
>>>
>>>   * seeing `_ common.Authorizer` in a parameter list is a sign you're
>>> DIW.
>>>   * failing to check some auth property when creating the facade is a
>>> sign you're DIW.
>>>   * failing to pass the auth on to the facade type itself *might* be OK,
>>> but should be viewed with some suspicion -- for example:
>>>   * an environ provisioner may have wide-ranging powers, but that's
>>> no reason to let it see or modify container machines that are not its
>>> direct responsibility
>>>   * right now, many user-facing facades don't need further
>>> authorization once they know it's an authenticated client, but that won't
>>> last forever
>>>
>>> helpful?
>>>
>>> Cheers
>>> William
>>>
>>> On Wed, Dec 2, 2015 at 12:26 PM, roger peppe <roger.pe...@canonical.com>
>>> wrote:
>>>
>>>> Could you link to the offending change(s) please, so we
>>>> can see what doing it wrong looks like?
>>>>
>>>> On 2 December 2015 at 09:28, William Reade <william.re...@canonical.com>
>>>> wrote:
>>>> > I just noticed that the unitassigner facade-constructor drops the
>>>> authorizer
>>>> > on the floor; and I caught a similar case in a review yesterday (that
>

apiserver authorizers

2015-12-02 Thread William Reade
I just noticed that the unitassigner facade-constructor drops the
authorizer on the floor; and I caught a similar case in a review yesterday
(that had already been LGTMed by someone else).

Doing that means that *any* api connection can use the thus-unprotected
facade -- clients, agents, and malicious code running in a compromised
machine and using the agent credentials. I don't think we have any APIs
where this is actually a good idea; the best I could say about any such
case is that it's not *actively* harmful *right now*. But big exploits are
made of little holes, let's make an effort not to open them in the first
place.

Moonstone, please fix the unitassigner facade ASAP; everyone else, be told,
and keep an extra eye out for this issue in reviews :).

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: apiserver authorizers

2015-12-02 Thread William Reade
The case I spotted yesterday was here:
http://reviews.vapour.ws/r/3243/diff/1/?file=167369#file167369line41

and in general:

  * seeing `_ common.Authorizer` in a parameter list is a sign you're DIW.
  * failing to check some auth property when creating the facade is a sign
you're DIW.
  * failing to pass the auth on to the facade type itself *might* be OK,
but should be viewed with some suspicion -- for example:
  * an environ provisioner may have wide-ranging powers, but that's no
reason to let it see or modify container machines that are not its direct
responsibility
  * right now, many user-facing facades don't need further
authorization once they know it's an authenticated client, but that won't
last forever

helpful?

Cheers
William

On Wed, Dec 2, 2015 at 12:26 PM, roger peppe <roger.pe...@canonical.com>
wrote:

> Could you link to the offending change(s) please, so we
> can see what doing it wrong looks like?
>
> On 2 December 2015 at 09:28, William Reade <william.re...@canonical.com>
> wrote:
> > I just noticed that the unitassigner facade-constructor drops the
> authorizer
> > on the floor; and I caught a similar case in a review yesterday (that had
> > already been LGTMed by someone else).
> >
> > Doing that means that *any* api connection can use the thus-unprotected
> > facade -- clients, agents, and malicious code running in a compromised
> > machine and using the agent credentials. I don't think we have any APIs
> > where this is actually a good idea; the best I could say about any such
> case
> > is that it's not *actively* harmful *right now*. But big exploits are
> made
> > of little holes, let's make an effort not to open them in the first
> place.
> >
> > Moonstone, please fix the unitassigner facade ASAP; everyone else, be
> told,
> > and keep an extra eye out for this issue in reviews :).
> >
> > Cheers
> > William
> >
> > --
> > Juju-dev mailing list
> > Juju-dev@lists.ubuntu.com
> > Modify settings or unsubscribe at:
> > https://lists.ubuntu.com/mailman/listinfo/juju-dev
> >
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: utils/fslock needs to DIAF

2015-12-01 Thread William Reade
Fully agree that fslock is terrible, but strongly against spending
significant engineering effort on it: AFAIAA, core won't need it any more
when we've integrated the agents, and that work is progressing nicely.
Given that, do we *really* need the os-agnostic abstraction we're
discussing?

Cheers
William

On Tue, Dec 1, 2015 at 10:10 AM, roger peppe 
wrote:

> So I'm with axw on this one - flock seems like it is a reasonable tool for
> the job here. FWIW a Unix domain socket also suffers from the "won't
> work across NFS" problem. Abstract unix domain sockets also
> have the problem that they won't work with long file paths (what
> were they thinking?)
>
> We have been using github.com/camlistore/lock and although it's not
> totally ideal (see https://github.com/camlistore/lock/issues/10 )
> it does the job. Note that it's non-blocking, so a blocking
> layer above it is necessary, for example see the lockFile
> function in
> https://github.com/juju/persistent-cookiejar/blob/master/serialize.go
>
> The Windows named mutex thing does sound interesting because
> it's a blocking API, which is actually what we need. On the other
> hand, under Windows, files opened for writing are locked by default
> anyway, so it might be easier just to leverage that property.
> The camlistore lock code could use some improvement for the
> Windows platform - we could either fork it, or bradfitz would
> probably be amenable to a PR.
>
>   cheers,
> rog.
>
> On 1 December 2015 at 04:12, Nate Finch  wrote:
> > I'm not a linux expert, but definitely a named mutex is exactly the
> correct
> > thing to use for Windows.  Using something else for this purpose would be
> > very surprising to a Windows dev and much more likely to be buggy.  I
> don't
> > see any reason we need to use the exact same implementation on all OSes
> if
> > there is something that does exactly the right thing without any
> finagling.
> > FWIW, mutexes do die with the process:
> >
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms682411(v=vs.85).aspx
> >
> > On Mon, Nov 30, 2015 at 8:29 PM Andrew Wilkins
> >  wrote:
> >>
> >> On Tue, Dec 1, 2015 at 9:07 AM David Cheney  >
> >> wrote:
> >>>
> >>> http://0pointer.de/blog/projects/locking.html
> >>>
> >>> In short, opening the same file twice and asserting a lock on it will
> >>> succeed.
> >>
> >>
> >> Thanks. The article is a bit exasperated. Yes, there are problems to be
> >> aware of, but it doesn't make them unusable in all cases.
> >>  - Juju agents don't get installed onto NFS file systems, so doesn't
> >> matter for the agents.
> >>  - We're in full control of the files we're locking, we're not locking
> >> some file like /etc/passwd where some other random bit of code in the
> >> process is going to open/close it and release the lock by accident.
> >>  - We don't need byte-range locking.
> >>
> >> So only the original uncertainty remains: do we care about clients
> running
> >> their home directory on an NFS share, where the NFS *server* is too old
> to
> >> support flock?
> >>
> >> Maybe a named mutex on Windows and a domain socket on *NIX is the way to
> >> go. I'm not dead set on flock; I just want something that is simple,
> robust
> >> and portable.
> >>
> >> Cheers,
> >> Andrew
> >>
> >>> On Tue, Dec 1, 2015 at 12:04 PM, Andrew Wilkins
> >>>  wrote:
> >>> > On Tue, Dec 1, 2015 at 8:57 AM David Cheney
> >>> > 
> >>> > wrote:
> >>> >>
> >>> >> fcntl won't work in threaded go applications, it barely works in non
> >>> >> threaded applications.
> >>> >
> >>> >
> >>> > This is news to me. I've used it plenty in the past, in
> multi-threaded
> >>> > programs. Please point me at relevant literature that explains where
> it
> >>> > doesn't work.
> >>> >
> >>> >>
> >>> >> I'm not interested in "doesn't work on windows" arguments. Yes, we
> >>> >> have to support windows, but that doesn't mean we have to be dragged
> >>> >> down to it's lowest common denominator.
> >>> >
> >>> >
> >>> > Agreed, if we're actually crippling anything.
> >>> >
> >>> >>
> >>> >> I think it's fine to develop a lock type that does the best
> available
> >>> >> for each platform using conditional compilation.
> >>> >
> >>> >
> >>> > Again, agreed, but only if there's something to be gained by doing
> >>> > this. I'm
> >>> > still not convinced there is.
> >>> >
> >>> >> On Tue, Dec 1, 2015 at 11:47 AM, Andrew Wilkins
> >>> >>  wrote:
> >>> >> > On Tue, Dec 1, 2015 at 8:03 AM David Cheney
> >>> >> > 
> >>> >> > wrote:
> >>> >> >>
> >>> >> >> On Tue, Dec 1, 2015 at 11:00 AM, Andrew Wilkins
> >>> >> >>  wrote:
> >>> >> >> > On Tue, Dec 1, 2015 at 7:57 AM David Cheney
> >>> >> >> > 
> >>> >> >> > wrote:
> >>> >> >> >>
> >>> >> >> >> Doesn't 

Re: Running upgrade steps for units

2015-09-15 Thread William Reade
Having the machine agent run unit agent upgrade steps would be a Bad Thing
-- the unit agents are still actively running the old code at that point.
Stopping the unit agents and managing the upgrade purely from the machine
would be ok; but it feels like a lot of effort for very little payoff, so
I'm most inclined to WONTFIX it and spend the energy on agent consolidation
instead.

Don't recall deciding not to execute upgrade steps on unit agents, but I
can equally see why the agent code made that a rational decision, so, ehh.
I suspect we should have had some tests that the upgrade framework rejected
unit agent tags, though.

Cheers
William

On Tue, Sep 15, 2015 at 7:13 AM, Tim Penhey 
wrote:

> Hi William and Menno,
>
> I've been investigating this bug:
> https://bugs.launchpad.net/juju-core/+bug/1494070
>
> The upgrade steps were never planned to be executed for unit agents,
> only on machine agents, but there are two upgrade steps identified in
> this bug where they were clearly expecting to run for units.
>
> Obviously there is a need to be able to run steps for units, but I don't
> think we should add it to unit agents, especially given the desire to
> merge the agents in the near future.
>
> What I do think we should do is provide a helper function inside the
> upgrade package that will run a step for each unit passing in the unit tag.
>
> So what we really need is the ability for the API as connected to by a
> machine to ask for a list of units that should be running on that
> machine. Do we have an API or state function for this already? I haven't
> found on in my cursory look just now, but I thought I'd raise this and
> do a sanity check at the same time.
>
> There would be clear limitations for the upgrade steps for the units as
> the API connection as as a machine agent.  However both the upgrade
> steps identified so far are just updating local disk state based on
> other disk state for a specified unit.
>
> Firstly does this sound like a reasonable approach?
>
> Tim
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: workers using *state.State

2015-09-09 Thread William Reade
Hey all

I'm sorry for the unnecessarily harsh tone in the OP, which was absolutely
not called for. Thank you all for stepping up to address it regardless.

I would like to calmly state that `doc/architectural-overview.txt` -- more
than a year old -- does state that all workers, whatever agent they're in,
should use the API server; and that we were all actively working on
api-everywhere not *that* long before then, so I am genuinely
surprised/saddened that our institutional knowledge didn't catch and
correct any of these before I threw a tantrum in person.

FWIW, I think that envworkermanager's *current* situation is justifiable:
IIRC the machine agent still needs the state conn to set up the singular
worker, and I suspect there are other subtler threads in play too. That's
not to say it *should* be doing that; but it is just invoking the general
run-env-workers code that pre-existed in the single-env agent code, and we
need to do more agent work before we can fully extract that dependency.

Cheers
William

On Tue, Sep 8, 2015 at 9:12 AM, William Reade <william.re...@canonical.com>
wrote:

> People keep writing them, in defiance of sane layering and explicit
> instructions, for the most embarrassingly trivial tasks
> (statushistorypruner? dblogpruner? txnpruner? *all* of those can and should
> pass through a simple api facade, not just dance off to play with the
> direct-db-access fairies.)
>
> There is no justification for *any* of those things to see a *state.State,
> and I'm going to start treating new workers that violate layering this way
> as deliberate sabotage attempts. Leads who have overseen the introduction
> of those workers, sort it out.
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Uniter tests for update-status hook - BLOCKER

2015-07-20 Thread William Reade
On Mon, Jul 20, 2015 at 4:57 PM, roger peppe rogpe...@gmail.com wrote:

 That's somewhat harder with the uniter, because its very state-dependent
 channel operations make it awkward to write a uniform outer select loop.

 If I were to do it, off the top of my head, I might consider making
 uniter.Mode
 (which BTW should not really be exported) into an interface and each of the
 existing mode functions into a separate types.


Yeah; oddly enough, it was partly my imagining going in more-or-less that
direction (but implementing the modes in another package) that led to them
being written as exported.

However, I think the mode approach has outlived its usefulness. After a
couple of years, Modes are now either trivial or bloated; in ModeAbide in
particular, there are too many possible inputs and reactions to keep track
of them in one select loop. And it's becoming increasingly inconvenient to
depend on the golang scheduler; and, basically, we need an operation queue
that self-prioritises/compacts as events come in.

It won't be trivial -- in particular, we need to disentangle the
local-state storage from the operations that currently generate it -- but
the queue can be synchronous; it can be composed of smaller queues that
manage priority of closely-related hooks (as relation hooks are implemented
today); and it can actually be unit-tested in adequate detail. And the
event delivery might still not be beautiful, but I'm pretty sure we can do
better than the existing Filter if we don't constrain ourselves by
optimizing the interface for delivering events to mode loops.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: I'm concerned

2015-06-17 Thread William Reade
I think the problem is in the implicit apiserver-leasemgr-state
dependencies; if the lease manager is stopped at the wrong moment, the
apiserver will never shut down because it's waiting on a blocked leasemgr
call. I'll propose something today.

On Wed, Jun 17, 2015 at 7:33 AM, David Cheney david.che...@canonical.com
wrote:

 This should be achievable. go test sends SIGQUIT on timeout, we can
 setup a SIGQUIT handler in the topmost suite (or import it as a side
 effect package), do whatever cleanup is needed, then os.Exit, unhandle
 the signal and try to send SIGQUIT to ourselves, or just panic.

 On Wed, Jun 17, 2015 at 3:25 PM, Tim Penhey tim.pen...@canonical.com
 wrote:
  Hey team,
 
  I am getting more and more concerned about the length of time that
  master has been cursed.
 
  It seems that sometime recently we have introduced serious instability
  in cmd/jujud/agent, and it is often getting wedged and killed by the
  test timeout.
 
  I have spent some time looking, but I have not yet found a definitive
  cause.  At least some of the time the agent is failing to stop and is
  deadlocked.
 
  This is an intermittent failure, but intermittent enough that often at
  least one of the unit test runs fails with this problem cursing the
  entire run.
 
  One think I have considered to aid in the debugging is to add some code
  to the juju base suites somewhere (or in testing) that adds a goroutine
  that will dump the gocheck log just before the test gets killed due to
  timeout - perhaps a minute before. Not sure if we have access to the
  timeout or not, but we can at least make a sensible guess.
 
  This would give us at least some logging to work through on these
  situations where the test is getting killed due to running too long.
 
  If no one looks at this and fixes it overnight, I'll start poking it
  with a long stick tomorrow.
 
  Cheers,
  Tim
 
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: I'm concerned

2015-06-17 Thread William Reade
...but I think that axw actually addressed that already. Not sure then;
don't really have the bandwidth to investigate deeply right now. Sorry
noise.

On Wed, Jun 17, 2015 at 10:52 AM, William Reade william.re...@canonical.com
 wrote:

 I think the problem is in the implicit apiserver-leasemgr-state
 dependencies; if the lease manager is stopped at the wrong moment, the
 apiserver will never shut down because it's waiting on a blocked leasemgr
 call. I'll propose something today.

 On Wed, Jun 17, 2015 at 7:33 AM, David Cheney david.che...@canonical.com
 wrote:

 This should be achievable. go test sends SIGQUIT on timeout, we can
 setup a SIGQUIT handler in the topmost suite (or import it as a side
 effect package), do whatever cleanup is needed, then os.Exit, unhandle
 the signal and try to send SIGQUIT to ourselves, or just panic.

 On Wed, Jun 17, 2015 at 3:25 PM, Tim Penhey tim.pen...@canonical.com
 wrote:
  Hey team,
 
  I am getting more and more concerned about the length of time that
  master has been cursed.
 
  It seems that sometime recently we have introduced serious instability
  in cmd/jujud/agent, and it is often getting wedged and killed by the
  test timeout.
 
  I have spent some time looking, but I have not yet found a definitive
  cause.  At least some of the time the agent is failing to stop and is
  deadlocked.
 
  This is an intermittent failure, but intermittent enough that often at
  least one of the unit test runs fails with this problem cursing the
  entire run.
 
  One think I have considered to aid in the debugging is to add some code
  to the juju base suites somewhere (or in testing) that adds a goroutine
  that will dump the gocheck log just before the test gets killed due to
  timeout - perhaps a minute before. Not sure if we have access to the
  timeout or not, but we can at least make a sensible guess.
 
  This would give us at least some logging to work through on these
  situations where the test is getting killed due to running too long.
 
  If no one looks at this and fixes it overnight, I'll start poking it
  with a long stick tomorrow.
 
  Cheers,
  Tim
 
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: juju deployer or juju quickstart on pure go

2015-06-04 Thread William Reade
Hi Vasiliy

If you're interested in bundle processing with golang, we should definitely
chat. A plugin would be perfectly possible, but it'd be *much* nicer (and
more generally useful) to have the code you want to write running directly
in the state server. If you'd like to talk directly, please ping me; I'm
fwereade on freenode, usually in #juju-dev.

Cheers
William

On Thu, Jun 4, 2015 at 9:08 AM, Mark Shuttleworth m...@ubuntu.com wrote:

  Hi Vasiliy

 We'd like to move bundle processing into Go, and actually build it
 straight into Juju core. If you are interested to help with that, William
 cc'd would be a good person to chat with, or could point you to the right
 person.

 Thank you!
 Mark



 On 04/06/15 00:37, Jeff Pihach wrote:

 Hi Vasiliy, Both projects are open source and written in python[1][2]. Is
 there a reason why you would like them in Go instead?

 While it may look like they simply deploy charms, in reality there is a lot
 of processing necessary to generate the proper api calls to deploy bundles
 according to the spec. This parsing is now handled by another python lib
 called bundlelib [3]. If you're interested in exploring this further I'd
 recommend checking out these projects first.

 Thanks
 - Jeff

 [1] https://code.launchpad.net/juju-quickstart
 [2] https://code.launchpad.net/juju-deployer
 [3] https://github.com/juju/juju-bundlelib


 On Wed, Jun 3, 2015 at 12:50 AM, Vasiliy Tolstov v.tols...@selfip.ru 
 v.tols...@selfip.ru
 wrote:


  As i see (not very deep knowledge) juju quickstart and juju deployer
 parses bundle.yaml get all needed charms and deploy it. If i'm write
 go based deployer that do exactly this things, does is possible to use
 it like juju quickstart / juju deployer (now only internal my usage).
 Or this tools handle more needed work and things not so simple ?

 --
 Vasiliy Tolstov,
 e-mail: v.tols...@selfip.ru

 --
 Juju mailing listj...@lists.ubuntu.com
 Modify settings or unsubscribe 
 at:https://lists.ubuntu.com/mailman/listinfo/juju





-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: Writing workers

2015-06-01 Thread William Reade
https://github.com/juju/juju/wiki/Guidelines-for-writing-workers

On Mon, Jun 1, 2015 at 3:32 PM, Nate Finch nate.fi...@canonical.com wrote:

 Totally belongs on the wiki.

 On Mon, Jun 1, 2015 at 8:45 AM, John Meinel j...@arbash-meinel.com
 wrote:

 This is one of those things that should probably end up on the Wiki.
 Thanks for writing it up.

 John
 =:-


 On Thu, May 28, 2015 at 5:34 PM, William Reade 
 william.re...@canonical.com wrote:

 Hi all

 I've noticed that there's a lot of confusion over how to write a useful
 worker. Here follow some guidelines that you should be *very* certain of
 yourself before breaking (and probably talk to me about anyway). If there's
 any uncertainty about these, I'm more than happy to expand.

   * If you really just want to run a dumb function on its own goroutine,
 use worker.NewSimpleWorker.

   * If you just want to do something every period, use
 worker.NewPeriodicWorker.

   * If you want to react to watcher events, you should probably use
 worker.NewNotifyWorker or worker.NewStringsWorker.

   * If your worker has any methods outside the Worker interface, DO NOT
 use any of the above callback-style workers. Those methods, that need to
 communicate with the main goroutine, *need* to know that goroutine's state,
 so that they don't just hang forever.

   * To restate the previous point: basically *never* do a naked channel
 send/receive. If you're building a structure that makes you think you need
 them, you're most likely building the wrong structure.

   * If you're writing a custom worker, and not using a tomb.Tomb, you
 are almost certainly doing it wrong. Read the blog post [0] or, hell, just
 read the code [1] -- it's less than 200 lines and it's about 50% comments.

   * If you're letting tomb.ErrDying leak out of your workers to any
 clients, you are definitely doing it wrong -- you risk stopping another
 worker with that same error, which will quite rightly panic (because that
 tomb is *not* yet dying).

   * If it's possible for your worker to call .tomb.Done() more than
 once, or less than once, you are *definitely* doing it very very wrong
 indeed.

   * If you're using .tomb.Dead(), you are very probably doing it wrong
 -- the only reason (that I'm aware of) to select on that .Dead() rather
 than on .Dying() is to leak inappropriate information to your clients. They
 don't care if you're dying or dead; they care only that the component is no
 longer functioning reliably and cannot fulfil their requests. Full stop.
 Whatever started the component needs to know why it failed, but that parent
 is usually not the same entity as the client that's calling methods.

   * If you're using worker/singular, you are quite likely to be doing it
 wrong, because you've written a worker that breaks when distributed. Things
 like provisioner and firewaller only work that way because we weren't smart
 enough to write them better; but you should generally be writing workers
 that collaborate correctly with themselves, and eschewing the temptation to
 depend on the funky layer-breaking of singular.

   * If you're passing a *state.State into your worker, you are almost
 certainly doing it wrong. The layers go worker-apiserver-state, and any
 attempt to skip past the apiserver layer should be viewed with *extreme*
 suspicion.

   * Don't try to make a worker into a singleton (this isn't particularly
 related to workers, really, singleton is enough of an antipattern on its
 own [2] [3] [4]). Singletons are basically the same as global variables,
 except even worse, and if you try to make them responsible for goroutines
 they become more horrible still.

 Did I miss anything major? Probably. If so, please remind me.

 Cheers
 William


 [0] http://blog.labix.org/2011/10/09/death-of-goroutines-under-control
 [1] launchpad.net/tomb (apparently... we really ought to be using v2,
 though)
 [2]
 https://sites.google.com/site/steveyegge2/singleton-considered-stupid
 [3]
 http://jalf.dk/blog/2010/03/singletons-solving-problems-you-didnt-know-you-never-had-since-1995/
 [4]
 http://programmers.stackexchange.com/questions/40373/so-singletons-are-bad-then-what/

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: gce provider options

2015-05-28 Thread William Reade
It looks like it's a wart -- AFAICS it's just another name for
image-metadata-url, but it's not actually hooked up to anything. Eric,
Wayne, any comments?

On Wed, May 27, 2015 at 8:28 PM, Nick Veitch nick.vei...@canonical.com
wrote:

 I notice in the boilerplate for the GCE provider, there is a key
  'image-endpoint:'

 This option hasn't featured anywhere else as far as I know, and wasn't
 included when we did the big roundup of all the config options imaginable a
 few months back (
 https://jujucharms.com/docs/stable/config-general#alphabetical-list-of-general-configuration-values
 )

 Is this something new? Is this analogous to something else? From the
 example I can work out what it does, but it would be nice to have some
 definitive information to add to the docs.

 Thanks

 --
 Nick Veitch
 nick.vei...@canonical.com

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Writing workers

2015-05-28 Thread William Reade
Hi all

I've noticed that there's a lot of confusion over how to write a useful
worker. Here follow some guidelines that you should be *very* certain of
yourself before breaking (and probably talk to me about anyway). If there's
any uncertainty about these, I'm more than happy to expand.

  * If you really just want to run a dumb function on its own goroutine,
use worker.NewSimpleWorker.

  * If you just want to do something every period, use
worker.NewPeriodicWorker.

  * If you want to react to watcher events, you should probably use
worker.NewNotifyWorker or worker.NewStringsWorker.

  * If your worker has any methods outside the Worker interface, DO NOT use
any of the above callback-style workers. Those methods, that need to
communicate with the main goroutine, *need* to know that goroutine's state,
so that they don't just hang forever.

  * To restate the previous point: basically *never* do a naked channel
send/receive. If you're building a structure that makes you think you need
them, you're most likely building the wrong structure.

  * If you're writing a custom worker, and not using a tomb.Tomb, you are
almost certainly doing it wrong. Read the blog post [0] or, hell, just read
the code [1] -- it's less than 200 lines and it's about 50% comments.

  * If you're letting tomb.ErrDying leak out of your workers to any
clients, you are definitely doing it wrong -- you risk stopping another
worker with that same error, which will quite rightly panic (because that
tomb is *not* yet dying).

  * If it's possible for your worker to call .tomb.Done() more than once,
or less than once, you are *definitely* doing it very very wrong indeed.

  * If you're using .tomb.Dead(), you are very probably doing it wrong --
the only reason (that I'm aware of) to select on that .Dead() rather than
on .Dying() is to leak inappropriate information to your clients. They
don't care if you're dying or dead; they care only that the component is no
longer functioning reliably and cannot fulfil their requests. Full stop.
Whatever started the component needs to know why it failed, but that parent
is usually not the same entity as the client that's calling methods.

  * If you're using worker/singular, you are quite likely to be doing it
wrong, because you've written a worker that breaks when distributed. Things
like provisioner and firewaller only work that way because we weren't smart
enough to write them better; but you should generally be writing workers
that collaborate correctly with themselves, and eschewing the temptation to
depend on the funky layer-breaking of singular.

  * If you're passing a *state.State into your worker, you are almost
certainly doing it wrong. The layers go worker-apiserver-state, and any
attempt to skip past the apiserver layer should be viewed with *extreme*
suspicion.

  * Don't try to make a worker into a singleton (this isn't particularly
related to workers, really, singleton is enough of an antipattern on its
own [2] [3] [4]). Singletons are basically the same as global variables,
except even worse, and if you try to make them responsible for goroutines
they become more horrible still.

Did I miss anything major? Probably. If so, please remind me.

Cheers
William


[0] http://blog.labix.org/2011/10/09/death-of-goroutines-under-control
[1] launchpad.net/tomb (apparently... we really ought to be using v2,
though)
[2] https://sites.google.com/site/steveyegge2/singleton-considered-stupid
[3]
http://jalf.dk/blog/2010/03/singletons-solving-problems-you-didnt-know-you-never-had-since-1995/
[4]
http://programmers.stackexchange.com/questions/40373/so-singletons-are-bad-then-what/
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Is simplestreams spam worth having in the Log

2015-04-01 Thread William Reade
+1000

On Wed, Apr 1, 2015 at 11:47 AM, John Meinel j...@arbash-meinel.com wrote:

 I've been noticing lately that everytime a test fails it ends up having a
 *lot* of lines about failing to find simplestreams headers. (this last test
 failure had about 200 long lines of that, and only 6 lines of actual
 failure message that was useful).

 Now I think there are a few things to look at here:

 1) The lines about looking for any double up and occur 9 times. Why are
 we repeating the search for tools 9 times in TestUpgradeCharmDir? maybe
 its genuine, but it sure feels like we're doing work over and over again
 that could be done once.

 2) We still default to reporting every failed index.json lookup, and *not*
 reporting the one that succeeded. Now these are at DEBUG level, but I have
 the feeling their utility is low enough that we should actually switch them
 to TRACE and *start* logging the one we successfully found at DEBUG level.

 Thoughts?

 John
 =:-

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: serializing to json

2015-04-01 Thread William Reade
On Tue, Mar 31, 2015 at 8:22 PM, Nate Finch nate.fi...@canonical.com
wrote:

 Struct tags that are the same as the name of field are never needed:

 type SomeResult struct {
 Error *Error`json:Error`
 Somethings []Something `json:Somethings`
 }

 The above is totally unnecessary... those fields will already serialize
 into Error and Somethings.  There's a bunch of this in the
 apiserver/params package... you don't need it, so don't do it.   It just
 causes confusion, because it looks like you're changing the name from the
 default, even though you're not.


Tim's pretty much covered this already, but I wanted to note the other big
problem with the params package: that it pulls in types from elsewhere, and
those types are a *massive* liability. For data that goes over the wire,
and data that goes into a database, please *always* be *local* and
*explicit* about what you're storing. As far as I'm concerned:

1) putting a *charm.Meta (or equivalent) direct into a database (or sending
it over the wire) is completely unjustifiable, because you're explicitly
dropping control of what you're storing/sending: some other guy can make a
perfectly reasonable change to a completely unrelated package and break
your wire format. This is a Very Bad Thing, where explicitly copying
structures at layer boundaries is merely Somewhat Tedious.

2) failing to use serialization annotations is essentially the same
mistake, because it makes it way too easy for a well-meaning programmer to
change a field name and again, whoops, break the wire format. Changing
field names is normal and good -- code evolves, abstractions change
(hopefully for the better) -- and implicit serialisation has chilling
effects there and leads to paranoid code fossilisation. (Or, worse, it
doesn't, and we just put up with a constant risk of arbitrary subtle
breakage.)

Cheers
William




 -Nate

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: On not copy-pasting state and params struct conversions

2015-02-10 Thread William Reade
So, I think you *do* need conversion code specific to state -- the
methods exported from state should be in terms of the storage types,
because they're the language we've chosen to model storage, so there
needs to be conversion code in state (or possibly in an internal
package -- but either way there's no reason to expose it to anyone
else).

WRT params, though, we already have two groups of client packages, so
the same forces don't quite apply: and I still prefer that we *not*
import other packages into params [0]. How about a single
params/convert package?

Cheers
William

[0] this is because the temptation to use a storage.Foo in place of a
params.Foo has historically been overwhelming (and while it's *very
dangerous* to do that it's hard to make the danger sufficiently
obvious to dissuade distracted devs from doing so).

On Tue, Feb 10, 2015 at 11:07 AM, Andrew Wilkins
andrew.wilk...@canonical.com wrote:
 Hi all,

 Anastasia raised an issue in http://reviews.vapour.ws/r/885/ about how to
 cut down on struct conversions between params, state, and domain packages.
 In this case we're talking about storage. The following API server facades
 currently participate in storage:
- client
- storage
- provisioner
- diskmanager (to be renamed, this lists block devices on machines)
- diskformatter
- storageworker (to be renamed, this is the dynamic storage provisioner)

 Each facade have some overlap in dealing with storage entities, e.g. the
 diskformatter and diskmanager each need to deal with block devices. This
 leads to much duplication of struct copying/conversion code when toing and
 froing between state and clients.

 I don't want to go adding conversion code to the params, state or storage
 packages, as they really shouldn't have dependencies on each other. Does
 anyone have a good idea about where to put this common functionality? Maybe
 api/common/storage, apiserver/common/storage? Does not appeal, but I
 can't think of a better option.

 Cheers,
 Andrew

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: A home for underappreciated errors

2015-02-10 Thread William Reade
On Tue, Feb 10, 2015 at 12:13 PM, William Reade
william.re...@canonical.com wrote:
 really ought to include a bunch more mechanism for finding out
 *precisely* what should have been assigned to what

To be clear: I do not think we should do this, because the
interpretation of those entities would themselves be domain-specific
-- if client code is written to expect a unit id and ends up suddenly
seeing a volume id, the *best* case is that it fails -- the worst is
that it keeps on running with bad data and smearing the effects of the
bug further through the system with arbitrary further fallout.

+100 to domain-specific errors.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: A home for underappreciated errors

2015-02-10 Thread William Reade
I think that NotAssigned is *too* generic. UnitNotAssignedToMachineErr
is one thing, and VolumeNotAssignedToStorageInstance is another; if we
use the same error type to represent those distinct situations, we
really ought to include a bunch more mechanism for finding out
*precisely* what should have been assigned to what (because
refactoring happens, and methods start returning the same generic
error for new reasons, potentially breaking client code that makes
assumptions about what that generic error type implies. See
worker/uniter/filter/filter.go and search for ErrTerminateAgent for a
particularly hair-raising example...)

I see what you're saying re apiserver/common, but I wouldn't really
characterise it as poking into state: apiserver is generally
implemented in terms of state, so that's a perfectly reasonable
dependency IMO. For exactly the reasons stated re generic error types
in code, the generic error codes in the api are the scary bits I now
wish we'd done differently...

Cheers
William

On Tue, Feb 10, 2015 at 10:55 AM, Andrew Wilkins
andrew.wilk...@canonical.com wrote:
 Hi all,

 Ian raised a good point in http://reviews.vapour.ws/r/885/ (caution advised,
 may cause eyebleed) about a change I made to errors:
 https://github.com/juju/errors/pull/17

 The NotAssigned error, which previously was only raised by
 state.Unit.AssignedMachineId, is now also raised by
 state.Volume.StorageInstance. I removed the error from the state package so
 we didn't have apiserver/common poking its head in there, and moved it over
 to the juju/errors package.

 The issue Ian raised is that juju/errors should contain relatively generic
 error types, and we should keep domain-specific things elsewhere. Examples
 are NotAssigned, and NotProvisioned. We're thinking of moving these
 domain-specific error types into a new package, juju/juju/errors. What do
 you think about this?

 Cheers,
 Andrew

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Detect destroying a service vs removing a unit?

2014-12-18 Thread William Reade
Yes: goal state could, I think, address this case. I'd really
appreciate more discussion on that in the Feature request: show
running relations in juju status thread -- I remain reasonably
certain that neither goal nor active is quite sufficient in
isolation, but would appreciate confirmation/pushback/whatever.

This case makes me think it'd need a slight extension regardless,
though, in that we don't currently tell units when they themselves are
meant to be shutting down. (Coincidentally, seeing relation-broken in
a peer relation *does* tell you that you're shutting down, but it's
not very helpful because you find out so late.)

(that is: I think we should tell units when they are dying; and also
when their service is dying. This demands both omnipresent state -- an
env var, or something -- and a hook or two to notify of the change in
either; but I'm not sure we can usefully distinguish between
scaling-down and shutting-down without knowing both, even if one of
them is communicated via goal-state)

Cheers
William



On Thu, Dec 18, 2014 at 7:32 AM, Stuart Bishop
stuart.bis...@canonical.com wrote:
 On 18 December 2014 at 12:15, John Meinel j...@arbash-meinel.com wrote:
 Stub- AFAIK there isn't something today, though William might know better.

 William, does your Active/Goal proposal address this? Having a Goal of 0
 units would be a pretty clear indication that the service is shutting down.

 Ooh... and that would be useful for plenty of other things too. For
 example, I can avoid rebalancing the replication ring until the goal
 number of units exists.

 --
 Stuart Bishop stuart.bis...@canonical.com

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: juju min version feature

2014-12-16 Thread William Reade
On Mon, Dec 15, 2014 at 5:51 PM, Nate Finch nate.fi...@canonical.com wrote:
 OK, so, many people in juju-core have talked about this with Eco, and we
 came to the conclusion that per-feature flags would be way too confusing for
 the charm consumer.  When I want to deploy a charm, I don't wnat to have to
 figure out what version of juju supports leader election so I can know if
 the charm I want is supported by my version of juju.  I just want to see a
 min version and compare it to my version of juju.

The idea was that consumers should never be presented with this
choice: they care about *workloads*, not about feature flags *or*
minimum versions. They say juju deploy postgres and juju says sir
yes sir; and they get the most recent version of the postgres charm
that can run on the environment they're using. IIRC the charm store
API has included an []warnings in the info results, from pretty much
v1 onwards, so we even have a pre-existing path for notifying people
that there is a more recent version available that's not being used.

 This does put a little more burden on charm authors, to do that translation
 themselves, but they would need to be able to do that anyway to understand
 what versions of juju support their charm.  This way we at least take that
 burden off the person deploying the charm.

To reiterate: *any* approach that puts the burden on *users* is
unacceptable. From their perspective juju needs to Just Work.

 Also, we very specifically only support min version.  This just means we'll
 need to be backwards compatible. There won't be leader election 2.0 that
 makes 1.0 not work.

...which is, well, the problem. What's the benefit of hamstringing our
future selves in this way? If we pre-emptively declare Compatibility
Forever we make it impossible to close the feedback loops that would
otherwise allow us to improve everyone's experiences. WRT leader
election in particular: leader election is evidently applicable in a
wide variety of contexts, but I bet there are some we haven't thought
of yet. I'm reluctant to declare that the approach we've agreed upon
is the best of all *possible* approaches -- it's just the best of the
ones we have yet been able to imagine.

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: juju min version feature

2014-12-16 Thread William Reade
On Mon, Dec 15, 2014 at 8:16 PM, Nate Finch nate.fi...@canonical.com wrote:
 There was a big discussion about this in Brussels, and the conclusion was
 that we should not do per-feature flags in the metadata, because the list of
 all flags becomes too large too fast, and too hard to understand for people
 using charms.

There's nothing stopping us from defining meta-flags -- eg
compatible-1.22 implies precisely the set of flags supported in
1.22, and will continue to work until there's a version of juju that
drops one of them.

 That is not to say that we can't have a list of features and the version at
 which they're available to help charm authors figure out the right min
 version to set.  Your idea of having juju talk to the charm store to get
 that info sounds great.

The charm store has to know about this regardless, so it knows what
charms to advertise to particular versions of juju.

 It's an interesting idea to get jujud to enable/disable features to simulate
 older versions of Juju so you don't accidentally use features that exist in
 newer versions of Juju than your charm specifies but that sort of seems
 like something that's beyond the scope of this project.

I think it's inevitable. Features will interact in surprising ways:
for example, consider default-hook and leader-deposed. Leader-deposed
runs in a weird and special hook context; a standard implementation of
default-hook will almost certainly fail messily if it runs
leader-deposed without knowing it's a possibility. (As it happens,
leader-deposed will ignore errors, so this wouldn't be *fatal* -- but
it'd still be unnecessarily messy. Error spam in the logs makes nobody
happy -- and I don't think we're in a position to assume that all
future surprise interactions will be so forgiving.)

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: juju min version feature

2014-12-16 Thread William Reade
On Tue, Dec 16, 2014 at 6:36 AM, Stuart Bishop
stuart.bis...@canonical.com wrote:
 I think we need the required juju version even if we also allow people
 to specify features. swift-storage could specify that it needs 'the
 version of juju that configures lxc to allow loopback mounts', which
 is a bug fix rather than a feature. Providing a feature flag for every
 bug fix that a charm may depend on is impractical.

1) If you're developing for 1.20, then I think the compatible-1.20
flag mentioned above should work as you desire, until juju changes to
the point where some feature is actively incompatible. (As stated
above, I'm expecting there will be some degree of tuning the charm
environment to the declared flags regardless.)

2) Expand on the impracticality a bit please? I imagine that when
we're talking about bugfixes of the sort you describe, the proportion
of charms that care about a given one will be small; tracking them all
may be somewhat *tedious* for the developers, but I don't see it being
especially difficult or risky -- and AFAICS it need not impact any
charm developers other than those who need that specific flag.

...not that I'm really keen to define a flag for every bugfix :-/. Do
you have a rough idea of how often you've wanted min-version so far?

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Proposal: feature flag implementation for Juju

2014-11-27 Thread William Reade
So, having spoken to Tim about this, and coming to understand the
motivations a bit more, I think the env var propagated everywhere, no
exceptions approach is the winner.

The deciding factor is that environ-config requires an API connection,
and that (one of) the motivating use case(s) involves exposing
incomplete CLI commands via feature flag (and I'm not aware of ones
that can't be fulfilled by env vars). I understand that environ-config
is superior in some other respects, but it's also more complex; and
furthermore I'd like to avoid having multiple mechanisms for the same
thing.

The biggest risk is that people will start to use and depend upon
feature flag behaviour, so I require that anyone using a feature flag
make it abundantly clear that it's unsupported behaviour, by (1)
including something like DEV in the var name and (2) logging its
usage promiscuously, and making clear in those messages that the
behaviour is not supported.

Cheers
William

On Wed, Nov 26, 2014 at 4:50 PM, Casey Marshall
casey.marsh...@canonical.com wrote:
 +1 for feature flags in general and +1 for using environment variables
 in upstart to get them to the servers  agents.

 I think it'd be nice to have an environment variable per flag, with a
 common prefix JUJU_FEATURE_. That way, if you need to check one in a
 package init(), you don't have to parse the whole list of flags to find
 the one you care about -- or depend on a globally initialized parsing of
 that list.

 env config seems reasonable, but dealing with parsing, errors and then
 making that config globally available at init seems complex and not
 always feasible.

 How about defining them free form in an env config field, which is
 then used to emit the env vars as described above to the upstart config
 during bootstrap?

 -Casey

 On 11/25/2014 10:16 PM, Ian Booth wrote:
 I like feature flags so am +1 to the overall proposal. I also agree with the
 approach to keep them immutable, given the stated goals and complexity
 associated with making them not so.

 I think the env variable implementation is ok too - this keeps everything 
 very
 loosely coupled and avoids polluting a juju environment with an extra config
 attribute.

 On 26/11/14 08:47, Tim Penhey wrote:
 Hi all,

 There are often times when we want to hook up features for testing that
 we don't want exposed to the general user community.

 In the past we have hooked things up in master, then when the release
 branch is made, we have had to go and change things there.  This is a
 terrible way to do it.

 Here is my proposal:

 http://reviews.vapour.ws/r/531/diff/#

 We have an environment variable called JUJU_FEATURE_FLAGS. It contains
 comma delimited strings that are used as flags.

 The value is read when the program initializes and is not mutable.

 Simple checks can be used in the code:

 if featureflag.Enabled(foo) {
   // do foo like things
 }

 Thoughts and suggestions appreciated, but I don't want to have the
 bike-shedding go on too long.

 Tim




 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Feature Request: show running relations in 'juju status'

2014-11-19 Thread William Reade
On Tue, Nov 18, 2014 at 9:37 AM, Stuart Bishop
stuart.bis...@canonical.com wrote:
 Ok. If there is a goal state, and I am able to wait until the goal
 state is the actual state, then my needs (and amulet and juju-deployer
 needs) will be met. It does seem a rather lengthy and long winded way
 of getting there though. The question I have always needed juju to
 answer is 'are there any hooks running or are there any hooks queued
 to run?'. I've always assumed that juju must already know this (or it
 would be unable to function), but refuses to communicate this single
 bit of information in any way.

Juju as a system actually doesn't know this. Unit idleness is known
only by the unit agents themselves, and only implicitly at that -- if
we're blocking in a particular select clause then we're (probably!)
idle, and that's it. I agree that exposing idleness would be good, and
I'm doing some of the preliminary work necessary right now, but it's
not my current focus: it's just a happy side-effect of what needs to
be done for leader election.

The impact of exposing goal/active state (and hooks that trigger on
changes to same) is rather different: it's internal to a unit, and is
essentially an alternative to joined/departed hooks. (There's nothing
stopping you having both, but not much point to it either.)

 That would work too. If all units are in idle state, then the system
 has reached a steady state and my question answered.

Sort of. It's steady for now, but will not necessarily still be steady
by the time you're reacted to it -- even if you're the only
administrator, imagine a cron job that uses juju-run and triggers a
wave of relation traffic across the system.

 I'm not entirely sure how useful this feature is, given the inherent
 race conditions with serialized hooks. Right now, you need to write
 charms that gracefully cope with dependent services that have gone
 down without notice. With this feature, you will need to write charms
 that gracefully cope with dependent services that have gone down and
 the notification hasn't reached you yet. Or if the outage is for
 non-juju reasons, like a network partition. The window of time waiting
 for hooks to bubble through could easily be minutes when you have a
 simple chain of services (eg. postgresql - pgbouncer - django -
 haproxy - apache seems common enough).

Yeah, you never get away from having to cope gracefully with
unexpected failures. But there is still value there -- when one of
your remotes takes itself voluntarily out of rotation, you can know
not to send it traffic until it tells you it's ready again.

 Your example with storage is particularly interesting, as I was just
 dealing with this yesterday in my rewrite of the Cassandra charm. The
 existing mechanism in the charm is broken. If you add a new unit to
 the service, it runs its install and configure hooks and is READY. It
 then joins the peer relation, and is still READY. The peer units start
 spewing data at it, as the replication ring is rebalanced.  We now
 have a race. Will the storage hooks fire in time? The new unit unaware
 that storage is due to be attached, and does not know that, unless the
 storage is attached and the data migrated from local disk soon, the
 local disk will fill and the unit will fall over. To solve this with
 the current storage-broker subordinate, I could require the operator
 to set an 'wait_for_block_storage' boolean in the service
 configuration before deploy. But requiring people to read and follow
 the documentation is an error prone solution :-( I'm wondering if I
 should simply not bother fixing this race, and trust that the block
 storage broker hooks will be invoked and completed before local disk
 is filled. I understand that work is underway to replace the block
 storage broker so it won't be an issue long term, or your goal state
 would be useful here if a unit can ask questions like 'is storage
 going to be attached' or 'will peers be joining me'.

So, in my mind, the goal/active stuff is relevant to all relations,
not just peers. So, on the one hand: yes, assuming the relation with
storage-broker exists at the time the unit starts up, it should be
aware that there will be storage.

But on the other... dynamically adding a storage-broker relation would
still be hard; and even when storage is in the juju model, handling
dynamic storage changes is going to take a bit of effort to migrate
your data.

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread William Reade
On Fri, Oct 24, 2014 at 8:04 PM, John Weldon johnweld...@gmail.com wrote:
 Sure, that makes sense.  Right now the Tag encodes a legitimate sequence.
 We should probably just clean up the representation so it doesn't expose the
 internals and just exposes the unit and action sequence number.

Yeah, that works for me. Please don't expose tags in the UI -- as
gustavo says, they're implementation details. The only critically
important property of a tag is that it be a *unique* entity identifier
for API use -- and that requirement is generally at odds with a
pleasant UX.

But, yes, if the user representation happens to have a clean 2-way
mapping with the relevant tags, that makes life easier in some
respects, and I certainly won't complain about that.

Cheers
William



 --
 John Weldon

 On Fri, Oct 24, 2014 at 10:58 AM, Gustavo Niemeyer
 gustavo.nieme...@canonical.com wrote:

 It was my mistake to call it a hash.. it may be just a random id, in hex
 form. Alternatively, use a service-specific sequence number so it's better
 suited to humans. In the latter case, the sequence number must realistically
 reflect the sequence in which the actions are submitted to units, otherwise
 it would be confusing.

 On Fri Oct 24 2014 at 3:51:04 PM John Weldon johnweld...@gmail.com
 wrote:

 Thanks Gustavo;

 I think a hash would be good too.  I'll see what I can find in the juju
 code base around hash representations of id's, or come up with something.
 Any suggestions on how to generate and translate the hash are welcome
 too.

 Cheers,


 --
 John Weldon

 On Fri, Oct 24, 2014 at 10:41 AM, Gustavo Niemeyer
 gustavo.nieme...@canonical.com wrote:

 The tag (which might be better named internal id) looks like an
 implementation detail which doesn't seem right to expose. I'd suggest 
 either
 giving it a proper representation that the user can understand (a 
 sequential
 action number, for example), or use a hash. I'd also not use a UUID, btw,
 but rather just a unique hash.



 On Fri Oct 24 2014 at 2:55:45 PM John Weldon johnweld...@gmail.com
 wrote:

 Hi;

 The current actions spec indicates that the actions command line should
 return a UUID as the identifier for an action once it's been en-queued 
 using
 'juju do action'.

 Is there a compelling reason to use UUID's to identify actions, versus
 using the string representation of the Tag?


 A UUID would require a command something like:
   juju status action:9e1e5aa0-5b9d-11e4-8ed6-0800200c9a66

 which maybe we could shorten to:
   juju status action:9e1e5aa0



 I would prefer something like:
   juju status action:mysq/0_a_3

 which would be the string representation of the actions Tag.



 Is there a compelling reason to use UUID?

 Cheers,

 --
 John Weldon
 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Using subdocument _id fields for multi-environment support

2014-10-01 Thread William Reade
I'm very keen on this. Thanks Menno (and Tim); unless anyone comes up
with substantial objections, let's go with this.

Cheers
William

On Wed, Oct 1, 2014 at 6:25 AM, Menno Smits menno.sm...@canonical.com wrote:
 Team Onyx has been busy preparing for multi-environment state server
 support. One piece of this is updating almost all of Juju's collections to
 include the environment UUID in document identifiers so that data for
 multiple environments can co-exist in the same collection even when they
 otherwise have same identifier (machine id, service name, unit name etc).

 Based on discussions on juju-dev a while back[1] we have started this doing
 this by prepending the environment UUID to the _id field and adding extra
 fields which provide the environment UUID and old _id value separately for
 easier querying and handling.

 So far, services and units have been migrated. Where previously a service
 document looked like this:

 type serviceDoc struct {
  Name  string `bson:_id`
  Seriesstring
  ...

 it nows looks like this:

 type serviceDoc struct {
  DocID string `bson:_id`   // env uuid:wordpress/0
  Name  string `bson:name`  // wordpress/0
  EnvUUID   string `bson:env-uuid`  // env uuid
  Seriesstring
  ...

 Unit documents have undergone a similar transformation.

 This approach works but has a few downsides:

 it's possible for the local id (Name in this case) and EnvUUID fields to
 become out of sync with the corresponding values the make up the _id. If
 that ever happens very bad things could occur.
 it somewhat unnecessarily increases the document size, requiring that we
 effectively store some values twice
 it requires slightly awkward transformations between UUID prefixed and
 unprefixed IDs throughout the code

 MongoDB allows the _id field to be a subdocument so Tim asked me to
 experiment with this to see if it might be a cleaner way to approach the
 multi-environment conversion before we update any more collections. The code
 for these experiments can be found here:
 https://gist.github.com/mjs/2959bb3e90a8d4e7db50 (I've included the output
 as a comment on the gist).

 What I've found suggests that using a subdocument for the _id is a better
 way forward. This approach means that each field value is only stored once
 so there's no chance of the document key being out of sync with other fields
 and there's no unnecessary redundancy in the amount of data being stored.
 The fields in the _id subdocument are easy to access individually and can be
 queried separately if required. It is also possible to create indexes on
 specific fields in the _id subdocument if necessary for performance reasons.

 Using this approach, a service document would end up looking something like
 this:

 type serviceDoc struct {
  IDserviceId `bson:_id`
  Seriesstring
  ...
 }

 type serviceId struct {
   EnvUUID string `bson:env-uuid`
   Namestring
 }

 There was some concern in the original email thread about whether
 subdocument style _id fields would work with sharding. My research and
 experiments suggest that there is no issue here. There are a few types of
 indexes that can't be used with sharding, primarily multikey indexes, but
 I can't see us using these for _id values. A multikey index is used by
 MongoDB when a field used as part of an index is an array - it's highly
 unlikely that we're going to use arrays in _id fields.

 Hashed indexes are a good basis for well-balanced shards according to the
 MongoDB docs so I wanted to be sure that it's OK to create a hashed index
 for subdocument style fields. It turns out there's no issue here (see
 TestHashedIndex in the gist).

 Using subdocuments for _id fields is not going to prevent us from using
 MongoDB's sharding features in the future if we need to.

 Apart from having to rework the changes already made to the services and
 units collections[2], I don't see any downsides to this approach. Can anyone
 think of something I might be overlooking?

 - Menno


 [1] - subject was RFC: mongo _id fields in the multi-environment juju
 server world

 [2] - this work will have to be done before 1.21 has a stable release
 because the units and services changes have already landed.



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: State should not depend on the API server

2014-09-16 Thread William Reade
On Thu, Sep 11, 2014 at 3:35 PM, Nate Finch nate.fi...@canonical.com wrote:
 I don't see how having different identical structs that are updated
 simultaneously in any way prevents any problems with compatibility.

If we're updating those structs simultaneously, we're completely
missing the point. Once we've defined a pure-data struct that might be
persisted or sent over the wire we *must not change that struct* -- if
we want to send or store different data, we need a new struct.

 Maybe I'm missing something from the proposal, but it seems like this just
 adds busywork without actually solving any problems that weren't caused by
 incorrect use of the code in the first place.

Isn't that tautological? AFAICS, storing a charm.Meta (or a
params.anything) directly in the DB *is* incorrect use of the code,
but nobody realises that until it's too late: that is the problem, and
that's what we're addressing.

 Separation of logic is absolutely a good thing.  Separation of data is not
 nearly so useful.

It's harder than you might think to separate the data from the logic
that acts on it ;).

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Versioning juju run (juju-run)?

2014-09-16 Thread William Reade
I think this comes down to checking whether the target unit's
agent-version is one that supports invoking relation contexts, and
failing nicely if it isn't -- ie I don't *think* it's to do with the
version of cmd/juju, but the version of cmd/juju*d* in play. Helpful?

On Fri, Sep 12, 2014 at 3:20 PM, Wayne Witzel
wayne.wit...@canonical.com wrote:
 Hi all,

 During the review of my PR https://github.com/juju/juju/pull/705 to add
 --relation to juju run, it was suggested that the cmd/juju be versioned.

 I spoke to a couple folks and was directed to
 https://github.com/juju/juju/pull/746 as an example of versioning an API.
 This makes sense, but I don't see how it applies to the cmd/juju parts of
 the code?

 I'm just failing to understand how to version the command-line options. Also
 the juju run client doesn't seem to have gone through the Facade refactoring
 that the other APIs have gone through.

 I'd appreciate some hand holding here as I'm failing to grok what needs to
 be done.

 Thanks,

 --
 Wayne Witzel III
 wayne.wit...@canonical.com

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Getting the API server URL in workers

2014-09-03 Thread William Reade
On Wed, Sep 3, 2014 at 4:49 PM, John Meinel j...@arbash-meinel.com wrote:
 It feels like this should be something that the API server is looking up in
 the database/some sort of query, not just munging a URL. I'm fine with
 having a Params struct, though I worry that it isn't really appropriate to
 be passing all possible Params to all facades. They already have a State
 connection, I'd like us to at least be designing the layering appropriately.
 I feel like state.State is intended to be where Knowledge resides. Why
 isn't what we want there?

Yeah, constructing it based on the current known addresses of the
state servers seems best. Do we maybe actually want a list of URLs
where the tools are available? That'd be the HA thing to do, I
suspect.

Cheers
William


 John
 =:-

 On Sep 3, 2014 6:03 PM, Andrew Wilkins andrew.wilk...@canonical.com
 wrote:

 I'm working on moving all tools downloads to download from the API server.
 There will are a few APIs that return tools:
   - Upgrader.Tools
   - Client.FindTools
   - Provisioner.FindTools

 These APIs will need to return URLs pointing at the API server. I'm
 intending to change the facade constructors to take a parameter struct, and
 extend it with the API server's root URL (i.e.
 https://address:port/environment/uuid), where the address is the one
 used by the client to connect.

 Any reason I should not go ahead and do that? This will probably make it
 easier to slot in a Restarter or whatever as well.

 Cheers,
 Andrew

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: State should not depend on the API server

2014-09-01 Thread William Reade
On Mon, Sep 1, 2014 at 8:03 AM, John Meinel j...@arbash-meinel.com wrote:
 FWIW I'd favor 3 layers, though it does mean you have to do copying between
 structs that would likely otherwise be almost identical. A State layer for
 saving in the DB, an API layer for communication, and a Model layer for use
 in the Agents/Clients.

Yes please, in general: one set of types per layer boundary.
Independent of what Dave's doing, which is necessary regardless, I
agree with what you're saying: *except* that I think we really have to
consider the API layer to be *two* layers.

That is to say: if you can change some type in api/params and
everything still works, you are Doing It Wrong. We cannot depend on
servers and clients always running the same version -- so, every time
you thus change server/client in a single motion, you're almost
certainly introducing more or less subtle incompatibilities.

So, I would be very pleased if we would stop using the same
definitions (ie api/params) on both sides of the wire -- it's one of
those things that's nice and easy when everyone's running the same
version, but an active trap as soon as multiple versions exist (as
they do).

All the struct-copying is kinda boring, but the layering violations
are straight-up evil.

Dave, thanks very much for doing this.

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Triggering Server Restarts

2014-08-26 Thread William Reade
On Tue, Aug 26, 2014 at 3:27 AM, Andrew Wilkins
andrew.wilk...@canonical.com wrote:
 It doesn't have to be the tomb itself; it can be an interface which is
 implemented by a type wrapping a tomb. My point was that it seems
 unnecessary to add channels and signalling when we already have that with
 Runner+Tomb.

Yeah, agreed: a minimal implementation where Server just exposed an
interface that got passed into FacadeFactory would be fine, for a
given value of fine. If we're touching those args, though, let's make
it a params struct so we have less pain to deal with next time it
changes ;).

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Triggering Server Restarts

2014-08-25 Thread William Reade
On Fri, Aug 22, 2014 at 1:10 PM, Andrew Wilkins
andrew.wilk...@canonical.com wrote:
 On Fri, Aug 22, 2014 at 11:57 AM, John Meinel j...@arbash-meinel.com
 wrote:
 I think the problem is that to trigger a restart, the worker itself (in
 this case APIServer) should be raising the error, which isn't really
 accessible in the context of the API call. (you want the API call to return
 successfully, an 'error' in this context is handed to the user, not up the
 Worker stack.) And the state/apiserver/client.Client object just has a
 direct reference to State, it doesn't have a reference to
 state/apiserver.Server to be able to trigger that sort of error in the
 apiserver.Server.tomb object.

Yes, the Server needs to finish with an error that communicates the
need to restart upwards.

In general a facade has references to a number of things, originally
provided by the Server, that it needs to do its job; this STM like
just another dependency like State, Authorizer, etc.

 Is there something wrong with giving Client a reference to the tomb? It
 needs *something*, so I don't see why not that, which would be the least
 amount of work.

Well, the tomb is kinda private. Least work in the *short* term,
maybe, but... no. ;p

At the moment the FacadeFactory type takes a *State, a Resources, and
an Authorizer (plus a string id): it would not be suitable to *just*
add another parameter to that type, 4 is enough, but it would be fine
to add a parameter type and add a Restarter field there.

 I guess it depends if this is an API we *ever* want to trigger at any
 other time than just at the beginning. Maybe we want to have the API so you
 call rollback even if it has been running for a while?

In particular, we want to roll back failed schema upgrades.

Cheers
William

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: First customer pain point pull request - default-hook

2014-08-20 Thread William Reade
On Wed, Aug 20, 2014 at 12:29 AM, Kapil Thangavelu 
kapil.thangav...@canonical.com wrote:

 hmm.. there's three distinct threads here.

 default-hook - charms that do so symlink 0-100% - to one hook.. in
 practice everything, sometimes minus install (as the hook infrastructure
 needs pkgs).. and most typically implemented via dispatch table.


And AFAICS everybody agrees it's nice, and we seem to be approaching
consensus re JUJU_HOOK_NAME.

something-changed - completely orthogonal to either the default-hook merge
 request, and in practice full of exceptions, but useful as an optimization
 of event collapsing around charms capable of event coalescence


Not *entirely* orthogonal -- they do similar *enough* jobs that you
wouldn't want to implement both in the same charm -- but, yeah. I'm happy
to discuss that further (new thread maybe?) but I don't think it's under
immediate consideration anyway.

periodic async framework invoked - metrics and health checks. afaics best
 practices  around these would be not invoking default-hook which is
 lifecycle event handler, while these are periodic poll, yes its possible
 but it conflates different roles.


Not so sure about this. By lifecycle events, do you mean all the events
we have hitherto communicated? I'm not sure the distinction between old
and new will be intuitively clear, and I don't think the consequences of
triggering default-hook are serious -- on that front I reserve my concern
for the introduction of hook contexts with surprising features, but again I
think there's a happy path we can walk by requiring opt-in to such features.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: First customer pain point pull request - default-hook

2014-08-20 Thread William Reade
On Wed, Aug 20, 2014 at 10:46 AM, Matthew Williams 
matthew.willi...@canonical.com wrote:

 Any default-hook that deviated from this pattern could find itself being
 run multiple times in succession - I wonder if that might be confusing/
 unexpected to a charm author?


It'll run multiple times in succession regardless, independent of switching
-- but, yes, unless it switches it'll always do the same thing :). I don't
*think* it's unexpected that we'd run default-hook once for each missing
hook, supplying the substituted hook name every time.


 Gustavo's observation about hooks that the charm might no know about yet
 means that the else clause is absolutely required, I wonder if that's
 obvious to someone who's new to charming?


I'm pretty much adamant that we shouldn't even run new hooks, or expose new
tools, unless the charm explicitly declares it knows about them. But I do
imagine that many implementations will want the else anyway: they don't
need to provide an implementation for every single hook anyway.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: First customer pain point pull request - default-hook

2014-08-19 Thread William Reade
On Mon, Aug 18, 2014 at 9:33 PM, Gustavo Niemeyer gust...@niemeyer.net
wrote:

 I don't think I fully understand the proposal there. To have such a
 something-changed hook, we ought to have a better mechanism to tell
 *what* actually changed. In other words, we have a number of hooks
 that imply a state transition or a specific notification (install,
 start, config-changed, leader-elected coming, etc). Simply
 calling out the charm saying stuff changed feels like a bad
 interface, both in performance terms (we *know* what changed) and in
 user experience (how do people use that!?).


The issue is that as charms increase in sophistication, they seem to find
it harder and harder to meaningfully map specific changes onto specific
actions. Whether or not to react to a change in one relation depends on the
values of a bunch of other units in other relations, to the extent that any
individual relation change can have arbitrarily far-reaching consequences,
and it ends up being easier to simply write something that maps directly
from complete-available-state to desired-config.

This situation seems to be leading more people to actually write (or depend
on) frameworks similar to the one Ben presented in Nuremberg (to, I
thought, a positive reception). Not all charms necessarily want or need to
be written this way -- perhaps the bulk of them shouldn't -- but those that
already *are* end up with really yucky execution patterns: they run the
same code *again and again*, potentially hundreds of times, quite probably
repeatedly bouncing the service  -- and ofc spamming the state server with
flushes for every hook executed, and probably spamming other units with
multiple relation changes; and in general taking much longer to arrive at
the ideal state.

This style of charm evidently *works* right now, it's true, but it's both
yucky and unnecessary to run so many hooks to no purpose. And if some
reasonable subset of charmers are tending that way anyway, let's make sure
juju can take advantage of the situation and use those charms to full
effect.

I understand the underlying problem William is trying to solve but the
 current proposal doesn't seem like a complete solution on its own, and
 it also seems to change the existing understanding of the model
 completely. The proposed default-hooks is a trivial change to the
 existing well known workflow.


It's not *quite* so trivial when you consider unknown future hooks: we need
to be careful that we don't run it in situations where the charm may not be
prepared to deal with the specific features of the hook context. Off the
top of my head:

  * relation-config-changed will have something that looks very much like a
normal relation context, but lacking JUJU_REMOTE_UNIT; the only other
context looking like that is relation-broken;
  * leader-deposed will completely lack hook tools: we can't run a
default-hook there unless we know for sure that the implementation doesn't
depend on any hook tools (in general, this is unlikely).

But if the default substitution only happens for scheduled hooks (it does,
thanks Nate) and if we're careful not to expose new features to charms that
don't know them (which we shall have to be anyway, lest unpredictably ugly
consequences), then we'll be fine.

And FWIW, +100 to $JUJU_HOOK_NAME. The argv stuff is deeply opaque -- sure,
argv[0] is obvious, but argv[1] is not; and it won't even be there when
people invoke it directly via debug-hooks or run, so a clearly-named env
var is surely the right way to go. (Apart from anything else, the
environment is where we put most of the other data pertaining to a
particular context: $JUJU_RELATION_ID, $JUJU_REMOTE_UNIT, etc etc)

Cheers
William


On Sun, Aug 17, 2014 at 2:30 AM, John Meinel j...@arbash-meinel.com wrote:
  I'd just like to point out that William has thought long and hard about
 this
  problem, and what semantics make the most sense (does it get called for
 any
  hook, does it always get called, does it only get called when the hook
  doesn't exist, etc).
  I feel like had some really good decisions on it:
 
 https://docs.google.com/a/canonical.com/document/d/1V5G6v6WgSoNupCYcRmkPrFKvbfTGjd4DCUZkyUIpLcs/edit#
 
  default-hook sounds (IMO) like it may run into problems where we do logic
  based on whether a hook exists or not. There are hooks being designed
 like
  leader-election and address-changed that might have side effects, and
  default-hook should (probably?) not get called for those.
 
  I'd just like us to make sure that we actually think about (and document)
  what hooks will fall into this, and make sure that it always makes sense
 to
  rebuild the world on every possible hook (which is how charm writers
 will be
  implementing default-hook, IMO).
 
  John
  =:-
 
 
 
  On Sat, Aug 16, 2014 at 1:02 AM, Aaron Bentley 
 aaron.bent...@canonical.com
  wrote:
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  On 14-08-15 04:36 PM, Nate Finch wrote:
   There's new hook in town: 

Re: First customer pain point pull request - default-hook

2014-08-19 Thread William Reade
On Tue, Aug 19, 2014 at 4:00 PM, Aaron Bentley aaron.bent...@canonical.com
wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 14-08-19 09:42 AM, Gustavo Niemeyer wrote:
  I have never seen myself a single charm that completely ignores
  all the action cues to simply re-read the whole state from the
  ground up,

 The cs:~juju-qa/precise/juju-reports charm follows this general
 pattern, but there are some specialized hooks: install, start, stop
 and upgrade-charm.


...and in case it's not clear: defined hooks would still continue to run at
the appropriate times, so exceptions can be accommodated. But I think it
becomes valuable as soon as there's any subset of hooks (most likely,
indeed, the relation ones) where the charm's put in a position of
rescanning the world one change at a time -- in which case all such
situations can be handled in batches at a more measured cadence.

config-changed, database-relation-broken, database-relation-changed,
 database-relation-departed, and website-relation-changed are all the
 same code and read the state afresh every time without reference to
 the script name.

 The charmworld charm is similar.


And I'd be most interested to hear from anyone else who is also finding
this mode of interaction with juju to be convenient...

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: First customer pain point pull request - default-hook

2014-08-19 Thread William Reade
On Tue, Aug 19, 2014 at 4:59 PM, Aaron Bentley aaron.bent...@canonical.com
wrote:

 True, I didn't call out the exceptions for the charmworld charm.  For
 completeness, the exceptions in charmworld are:
 install
 nrpe-external-master-relation-changed
 restart


(this isn't actually a hook?)


 reverseproxy-relation-joined
 start
 stop


(out of interest, if started/stopped state were communicated to you any
other way, would you still need these?)


 upgrade-charm
 website-relation-changed


Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Implement system reboot via juju hooks

2014-08-11 Thread William Reade
On Sat, Aug 9, 2014 at 11:19 AM, Stuart Bishop stuart.bis...@canonical.com
wrote:

 I don't think this should be restricted to server reboots. The
 framework is generally useful.


I agree that it's generally useful to be able to defer running things in a
hook context, but I'd prefer to keep this tightly scoped if possible.

I have hooks that need to bounce the primary service so config changes
 can take effect. They can't do that if a long running operation is
 currently in progress, eg. a backup or a replica node is being built.
 Currently, I need to block the hook until such time as I can proceed.
 I think this would be cleaner if I could instead return a particular
 error code from my hook, stating that it is partially complete and
 requesting it to be rescheduled.


I definitely agree that having to block inside hooks is tedious and
generally unhelpful (other units, for example, cannot run their own
independent hooks; machine agents can't (safely) apt-get anything).

So it would be nice if requesting a reboot and requesting a hook to be
 rescheduled are independent things.


+1 to this for sure.

I had wondered if juju-run should allow arbitrary things to be run in
 a hook context later.

 juju-run --after hook /sbin/reboot # queue the reboot command to be
 run after this hook completes.
 juju-run --after hook config-changed  # queue the config-changed hook
 to be run after this hook completes, and after any previously queued
 commands
 juju-run --after tomorrow report-status # Run the report-status
 command sometime after 24 hours.


This *is* cool, but I have a few thoughts.

0) trivially,  don't think juju-run is the right spelling -- possibly
juju-schedule?

1) I think that the reboot case is special enough not to be subsumed by
that command.

2) I'm not clear on the benefit of constructing an additional charm-driven
hook queue, and I worry that it would make the actual operation of the unit
agent alarmingly opaque -- interaction with error states, interaction with
charm upgrades, confusion due to other hooks jumping the queue, etc.

3) For this sort of schedule, might cron be a happier solution?

I'd like to explore your use cases a bit more to see if we can find a clean
solution to your problems that doesn't go too far down the (2) road that
I'm nervous about. (The try-again-later mechanism is much smaller and
cleaner and I think we can accommodate that one pretty easily, fwiw -- but
what are the other problems you want to solve?)

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Implement system reboot via juju hooks

2014-08-11 Thread William Reade
On Mon, Aug 11, 2014 at 3:00 PM, Stuart Bishop stuart.bis...@canonical.com
wrote:

 On 11 August 2014 18:20, William Reade william.re...@canonical.com
 wrote:

  I'd like to explore your use cases a bit more to see if we can find a
 clean
  solution to your problems that doesn't go too far down the (2) road that
 I'm
  nervous about. (The try-again-later mechanism is much smaller and cleaner
  and I think we can accommodate that one pretty easily, fwiw -- but what
 are
  the other problems you want to solve?)

 Memory related settings in PostgreSQL will only take effect when the
 database is bounced. I need to avoid bouncing the primary database:
  1) when backups are in progress.
  2) when a hot standby unit is being rebuilt from the primary.

 Being able to have a hook abort and be retried later would let me
 avoid blocking.


Hmm. The trouble here is that releasing the execution lock would *also*
free up the machine agent to be rebooted -- the benefits of being able to
run other hooks while you wait don't feel quite so compelling to me now.

A locking service would be useful too for units to signal certain
 operations (with locks automatically released when the hooks that took
 them exit). The in-progress update to the Cassandra charm has
 convoluted logic in its peer relation hooks to do rolling restarts of
 all the nodes, and I imagine MongoDB, Swift and many others have the
 same issue to solve.


I see -- to get rolling restarts you'd need to spread an awful lot of
finicky logic across the peer relation hooks. I'm expecting to address this
issue by allowing leaders to run actions on their minions. ie as leader,
you can just run the action and wait for it to succeed or fail before
continuing, all inside a single hook. Sane/helpful?

Cheers
William




 --
 Stuart Bishop stuart.bis...@canonical.com

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: avoiding unnecessarily entering upgrade mode

2014-08-06 Thread William Reade
SGTM too. It should always have worked like this -- rerunning all our
upgrade steps every time is *crazy*.


On Wed, Aug 6, 2014 at 3:19 AM, David Cheney david.che...@canonical.com
wrote:

 SGTM.

 On Wed, Aug 6, 2014 at 11:10 AM, Menno Smits menno.sm...@canonical.com
 wrote:
  Right now, a Juju machine agent is in upgrade mode from the moment it
  starts until the upgrade-steps worker is finished. During this period API
  logins are heavily restricted and most of the agent's workers don't start
  until upgrade mode stops.
 
  This happens even when there is no upgrade to perform. The upgrade-steps
  worker always runs at machine agent startup and upgrade mode is in force
  until it finishes.
 
  Upgrade mode is typically short-lived (say 10 seconds) but if something
 is
  wrong (e.g. mongo issues) the upgrade-steps worker may take longer or not
  finish resulting in the user seeing lots of upgrade in progress
 messages
  from the client and in the logs.
  This is particularly confusing when a user hasn't even requested an
 upgrade
  themselves.
 
  I would like to change the machine agent so that upgrade mode is only
  entered if the version in agent.conf is different from the running
 software
  version. This would mean that upgrade mode is only entered if there is an
  actual upgrade to perform.
 
  The version in agent.conf is only updated after a successful upgrade so
 it
  is the right thing to use to determine if upgrade mode should be entered.
 
  The current behaviour means that the (idempotent) upgrade steps for the
  current version are always run each time the machine agent starts. If the
  change I'm proposing is implemented this will no longer happen. Does this
  seem like a problem to anyone? For instance, do we rely on the upgrade
 steps
  for the current version being run after bootstrap?
 
  The ticket for this work is at: https://bugs.launchpad.net/bugs/1350111
 
  Cheers,
  Menno
 
 
 
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe at:
  https://lists.ubuntu.com/mailman/listinfo/juju-dev
 

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: apiserver/testing.FakeAuthoriser is not a good mock

2014-07-31 Thread William Reade
On Thu, Jul 31, 2014 at 7:30 AM, David Cheney david.che...@canonical.com
wrote:

 Proposal: remove Authoriser.GetAuthEntity() and replace all calls with
 Authoriser.GetAuthTag(). I have a branch for this, it's  30 lines as
 there are only 3 places in the apiserver where we do this and they
 usually call the .Tag method on the entity they get back from
 GetAuthEntity.


SGTM.

2. This extends from point 1, while the svrRoot derives everything
 from svrRoot.entity, the FakeAuthoriser allows the caller to pass a
 unique value for the tag and the entity of the authorisee. This leads
 to impossible situations where the FakeAuthorizer returns nil for
 AuthGetEntity but a non nil value for AuthGetTag. Other permutations
 exist in our tests and are expected by the test logic.

 Propsal: Once step 1 is fixed the difference between svrRoot and
 FakeAuthoriser is the former starts from a state.Entity and derives a
 tag from which other values are derived, the latter skips the initial
 step and starts from a tag. This work falls out of the solution to
 steps 1 and 3.

 3. The helper methods, AuthMachineAgent(), AuthUnitAgent() on the
 Authoriser interface are implemented differently in svrRoot and
 FakeAuthoriser. In tests it is quite common to take the FakeAuthoriser
 from the test suite, copy it and change some of these values leading
 to impossible situations, ie, the tag or entity of the FakeAuthoriser
 pointing to a Unit, but AuthMachineAgent() returning true.

 Proposal: The simplest solution is to copy the implementation of these
 helper methods from svrRoot to FakeAuthoriser. A more involved
 solution would be to factor these methods out to be functions in the
 apiserver/common package that take an Authorizer. This second step may
 not pay for itself.


I would strongly favour the latter solution. The benefit of having a fake
authorizer whose behaviour diverges is that (at least in theory) it allows
for comprehensive testing against arbitrary authorizer behaviour;
constraining that behaviour so we're only testing realistic situations will
be great, but I fear that doing so by copy-pasting code will only lead to
more divergences in future. Let's make sure we're using the same logic
across the board.

Cheers
William

These steps resolves the majority of the discontinuity between the two
 implementations and will resolve a set of blocking issues I've hit
 converting the apiserver and state packages to speak tags natively.

 Thanks for your time

 Dave

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: backup API in 1.20

2014-07-30 Thread William Reade
On Wed, Jul 30, 2014 at 8:25 AM, John Meinel j...@arbash-meinel.com wrote:

 So as discussed, I think we still want to support a HTTP GET based API for
 actually downloading the content to the client. And the CLI can do steps
 like:
  RPC to a request to create a backup (either this is synchronous and
 returns a URL, or the next step is to wait on the next request for when it
 is done to get the URL)
  HTTP GET to download the backup .tar (.xz,,bz2,gz whatever)
 put those bytes somewhere locally and then exit.


Absolutely agreed.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: backup API in 1.20

2014-07-29 Thread William Reade
On Tue, Jul 29, 2014 at 6:57 PM, roger peppe rogpe...@gmail.com wrote:

 On 29 July 2014 16:50, Eric Snow eric.s...@canonical.com wrote:
  The API server side of backup made it into 1.20 (the client-side and
  CLI did not).  However, the API is exposed via HTTP POST rather than
  through websockets RPC.

 An HTTP POST request seems about right for a call that
 streams a significant amount of data. Is there any particular
 reason this has to change?


It's not what we discussed and agreed in vegas; and, crucially, it fails to
make previously-made backups available from any state server. We want WS
APIs for creating, listing, and deleting backups; and, yes, we want to
*download* those backups over HTTP, and the CLI will be creating them,
waiting for them to be ready, and then downloading them over HTTP. But a
POST to just create-a-backup-and-download-it-and-forget-about-it is not
what we want or need.

WRT the proposal:

Given that we explicitly flagged the POST API as experimental, and we
didn't release a client that actually used it, I think we're safe fixing
and backporting (although, remember, the 1.18-based backup/restore needs to
continue to work -- this lets us handle only *two* mechanisms rather than
three, it doesn't let us use just one).

But I'm a bit suspicious... would someone please confirm that we don't have
*any* released clients that use the POST API? The above is predicated on
that assumption.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: api/cli compatability between juju minor versions

2014-07-25 Thread William Reade
On Sat, Jul 26, 2014 at 7:56 AM, Curtis Hovey-Canonical 
cur...@canonical.com wrote:

 Ladies and Gentlemen.

 The Juju QA meet with Robie Basak to discuss backporting juju 1.20.x
 to trusty. Robie was planning to supersede 1.18.1 with 1.20.2. I
 suffered an anxiety attack.

 My current understanding is that we promise API/CLI compatibility
 between one minor version and the next. Features are can be
 deprecated, and they can be removed in the next minor release. Thus
   1.18.1  1.20.2 will work
   1.18.1  1.21.0 may not work


I'm rather bothered that this notion still has currency. We could get away
with that in the past, but juju is in trusty, and we can't go breaking
people now. Any significant changes to the CLI or API need to be saved up
for a juju 2.0.

It's fine to add new features, so long as they fail gracefully when
attempted against older environments -- but if you're removing or changing
an API, or removing or changing the meaning of a command line flag, you are
Doing It Wrong. Developers, team leads, since there has apparently been
some lack of clarity: be told, again, loudly:

Thou Shalt Not Break Compatibility With 1.18. We Are Stuck With It.

This scenario outline what can happen:

 An enterprise machine uses trusty and juju 1.18.1 It manages several
 permanent 1.18.x environments [1]. The machine also has several juju
 scripts to create and destroy juju envs to do ephemeral work.
 The machine doesn't get package updates from 1 year.

 Package updates are enabled; juju 1.25.0 is installed. The users of
 the machine didn't see the package was updated, they know nothing of
 the new features. Can they still manage there eternal 1.18.x envs? Do
 the
 automated scripts continue to create and destroy ephemeral envs [2]

 As Ubuntu promises stability within a series, we cannot replace a well
 understood juju with newer version of juju that redefines or removed
 features.

 I know that the client is responsible for provisioning during
 bootstrap and add-machine. Redefinitions of behaviours can happen. I
 think they already have. For example, I see
 commit 5d0c7aeb7d3049672df587cb11f455465cce4a57
[bug=1302005] cmd/juju: deprecate --series, added --upload-series
 introduced in 1.19.2 as a planned change. 1.20.2 still supports
 --series. I can see that 1.21-alpha1 still supports --series, but when
 is --series obsoleted?


I don't think we can drop --series.

I also don't think any juju code can ever be relied upon to provision any
juju version different to itself, which may indicate a subtle bug around
client add-machine for the manual provider... Andrew, do you think it could
be reasonable to push that into the provisioner? We'd need to distribute
the env public key to the machines being added, but that doesn't seem like
the end of the world...

If Juju is going to promise api/cli compatibility support for more
 than 1 minor version, how long do we support? 2 years? Precise is more
 than 2 years and we see the series is still popular.


LTS. 5 years. I tend to abbreviate this in casual conversation as forever
because it seems to put people in the right mindset.

Eventually, Juju will want to break compatibility, maybe it will need
 to. Ubuntu will need to create co-installable packages. Users will
 knowingly install newer version of juju to get new features. Old
 versions will still be installed an usable. User can reinstall old
 versions if they discover compatibility issues they cannot mitigate
 quickly.


Those breaks should be saved up for major version upgrades.

Cheers
William



 [1] We know that trusty 1.18.1 client users are deploying and
 upgrading envs to 1.18.4. We know these version are compatible

 [2] The new envs will be 1.24.x

 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: series-agnostic charm URLs

2014-07-24 Thread William Reade
On Thu, Jul 24, 2014 at 12:59 AM, roger peppe roger.pe...@canonical.com
wrote:

 On 23 July 2014 11:51, John Meinel j...@arbash-meinel.com wrote:
  Casey is the one to talk to. I'm not sure if he was just trying to be
  conservative, but he did intentionally not want to pun the types. I
 think he
  wanted to be clear that things passed to *these* API calls must be fully
  formed, while ones passed to *those* could be expanded.
  We could do what you say, but it starts meaning every API that takes a
 URL
  has to start worrying about expanding it late.
  Since it is no longer charm.URL but really charm.MaybeURL.

 Luckily there are no APIs that currently take a URL type as an
 argument. The only ones that do (ServiceDeploy, for example)
 currently use a string, because they allow a partially specified
 URL, so can't use charm.URL.


WTF? We have charm.InferURL *and* charm.ParseReference for the cases where
sloppy input is reasonable -- charm.ParseURL should *parse a URL*, and fail
if not given a string that represents a URL.

And ServiceDeploy should *definitely* not accept sloppy input: I'm certain
we had exactly this argument back in the early API days, and looking at the
code STM to imply that a reasonable attempt was made to follow my
instructions -- ie ServiceDeploy will fail if the charm revision is not
specified, as it *should* if the schema or series or anything else is left
unspecified.

Anyway, someone changed some code and introduced a bug that got missed in
review; it happens. However, someone now needs to go into the charm package
and add some tests to ParseURL to ensure it errors as it should when
schema, or series, is omitted.

Note that not even charm.URL is itself currently necessarily fully
 specified - the revision can be omitted.


It's a valid charm URL with or without a revision. It is *not* a valid
charm URL without a schema, series, or name. These definitions have not
changed in years.

On 23 July 2014 13:01, Gustavo Niemeyer gust...@niemeyer.net wrote:
  On Wed, Jul 23, 2014 at 7:35 AM, roger peppe rogpe...@gmail.com wrote:
  We want to store charm URLs in mongo-db that are agnostic whether
  the series is specified or not. For example, in a bundle, a service
  is free to specify a series in the charm name or not.
 
  That sounds slightly surprising. How do we plan to define what the
  bundle actually means?

 The charm URL in a bundle means exactly what it would mean if
 you typed it in a juju deploy command. That is, it is dependent
 on the charms available at bundle deploy time.
 An unspecified revision uses the latest available revision;
 an unspecified series uses the latest LTS series that's
 available for the charm.


This makes me shudder -- why would we want to encourage distribution of a
format encoding vague guesses as to validated best practice, rather than
encoding *actual* validated best practice? `juju deploy foo` is great for
demos and exploration, but I'd expect any serious user to value
reproducibility enough to specify the actual charm precisely; and for them
to expect that the bundles they use would do the same.

 As noted above, a Reference may not have a schema as well, so this
  suggestion seems to imply that foo becomes a valid URL.

 Yes, both ParseURL and ParseReference both already allow
 the schema to be omitted, defaulting to cs: if so.

  Maybe having just URL could be made cleaner, though. This should
  be judged based on a more detailed proposal.

 I do believe having just URL would be significantly cleaner.
 What area would you like to see more detail on?


Fuzziness over the meaning of URL appears to have already broken ParseURL,
and hence ServiceDeploy; let's not double down on that. Even if the bundle
format's sloppiness is already entrenched, that's no reason for juju-core
to give up and just start accepting whatever -- the clients can damn well
rewrite sloppy bundles before using them with juju, just as they should,
and do (or at least *did*... who knows what's happened recently since
validation got broken) rewrite sloppy charm URLs before passing them to
juju.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Charm store API proposal, new version

2014-07-16 Thread William Reade
On Wed, Jul 16, 2014 at 12:10 PM, Richard Harding 
rick.hard...@canonical.com wrote:

 I'm not against having any bulk api calls but we've got a handful of
 clients and the one use case we've found for them is Aaron's work which I
 think we can better address with our current scheme anyway as he only
 needed the bulk api call to get data he'd already have on hand.


I'd suggest at least one other use case -- namely, that juju-core wants to
know which deployed charms have updates available, and it's annoying to
issue N requests for N charms.

But the major drivers are:

1) consistency, so that clients across the board can just *always know*
that they can issue a given request in N-plicate and never have to mess
around hunting for which variant of a given api call offers what they're
looking for;

2) expressivity, because you can always express a single request in a bulk
call but can't go the other way;

3) agility(?), in that bulk calls have scope for server-side optimisations
that would be very very hard to apply across a list of single calls, and
we'd rather address performance issues via implementation tweaks than API
churn; and

4) past experience, both others' (launchpad) and our own, leading us to
believe that it's generally nicer to work with chunky APIs than chatty
ones, despite the slightly steeper initial learning curve.

...not to mention that IMO most of the arguments for single calls represent
a failure of imagination. I certainly agree that it's not always obvious
why someone might want to discover the latest revisions of N charms at
once, or the metadata for M other charms that may or may not be directly
related to one another, or reporting info for a bunch of others (or, in
juju-core terms, why we might want to find out the life statuses of a bunch
of entities, or why the set of state server addresses might vary according
to what agent needs to connect, or why a client might want provisioning
info for a whole bunch of new machines at a time).

But simple prudence dictates that we pay the minimal costs of enabling such
behaviours early, in the interest of providing a rich experience for all
our clients, rather than scrambling to fix them up later as we discover the
use cases that always somehow seem terribly predictable in hindsight -- and
in doing so take on the long-term costs of uglified and non-orthogonal APIs
(or, perhaps even worse, failing to do so and leaving our clients hanging
because we're focused on other things).

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Current handling of failed upgrades is screwy

2014-07-15 Thread William Reade
FWIW, we could set some error status on the affected agent (so users can
see there's a problem) and make it return 0 (so that upstart doesn't keep
hammering it); but as jam points out that's not helpful when it's a
transient error. I'd suggest retrying a few times, with some delay between
attempts, before we do so (although reporting the error, and making it
clear that we'll retry automatically, is probably worthwhile).

And, really, I'm not very keen on the prospect of continuing to run when we
know upgrade steps have failed -- IMO this puts us in an essentially
unknowable state, and I'd much rather fail hard and early than limp along
pretending to work correctly. Manual recovery of a failed upgrade will
surely be tedious whatever we do, but a failed upgrade won't affect the
operation of properly-written charms -- it's a management failure, so you
can't scale/relate/whatever, but the actual software deployed will keep
running. However, I can easily imagine that continuing to run juju agents
against truly broken state could lead to services actually being shut
down/misconfigured, and I think that's much more harmful.

Cheers
William


On Thu, Jul 10, 2014 at 9:57 AM, John Meinel j...@arbash-meinel.com wrote:

 I think it fundamentally comes down to is the reason upgrade failed
 transient or permanent, if we can try again later, do so, else log at
 Error level, and keep on with your life, because that is the only chance of
 recovery (from what you've said, at least).

 John
 =:-


 On Thu, Jul 10, 2014 at 11:18 AM, Menno Smits menno.sm...@canonical.com
 wrote:

 So I've noticed that the way we currently handle failed upgrades in the
 machine agent doesn't make a lot of sense.

 Looking at cmd/jujud/machine.go:821, an error is created if
 PerformUpgrade() fails but nothing is ever done with it. It's not returned
 and it's not logged. This means that if upgrade steps fail, the agent
 continues running with the new software version, probably with partially
 applied upgrade steps, and there is no way to know.

 I have a unit tested fix ready which causes the machine agent to exit (by
 returning the error as a fatalError) if PerformUpgrade fails but before
 proposing I realised that's not the right thing to do. The agent's upstart
 script will restart the agent and probably cause the upgrade to run and
 fail again so we end up with an endless restart loop.

 The error could also be returned as a non-fatal (to the runner) error
 but that will just cause the upgrade-steps worker to continuously restart,
 attempting the upgrade and failing.

 Another approach could be to set the global agent-version back to the
 previous software version before killing the machine agent but other agents
 may have already upgraded and we can't currently roll them back in any
 reliable way.

 Our upgrade story will be improving in the coming weeks (I'm working on
 that). In the mean time what should we do?

 Perhaps the safest thing to do is just log the error and keep the agent
 running the new version and hope for the best? There is a significant
 chance of problems but this is basically what we're doing now (except
 without logging that there's a problem).

 Does anyone have a better idea?

 - Menno





 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread William Reade
On Tue, Jul 15, 2014 at 2:51 PM, Wayne Witzel wayne.wit...@canonical.com
wrote:

 If we aren't stopping the line when our CI is in the red, then what is the
 point of even having CI at all? If we are not prepared to adjust the
 culture of our development. To truly halt forward progress in favor of
 chasing down regressions then I struggle to find the benefit that CI and
 testing is giving us at all. Just confirming that master is broken and we
 are still ignoring it? If master is broken, we all our broken. No
 development you are doing should proceed that is based on a broken master.
 That work cannot at any point be considered in good working condition. A
 problem in master is everyone's problem.

 Bugs that are found throughout the normal operations and usage of juju
 should be assigned a priority and queued, but regression is a sign of a
 greater problem that should be resolved immediately. Allowing regressions
 to not stop the line is taking the stance that we don't care about
 deterioration in our code base.


+100 to this. Regressions are a Big Deal and should be treated as such;
leaving other merges queued until such a time as the regression is fixed
(or backed out for rework) is entirely reasonable (and I think we've got
enough evidence that the alternative really doesn't fly effectively).

Cheers
William




 On Tue, Jul 15, 2014 at 9:37 AM, Nate Finch nate.fi...@canonical.com
 wrote:

 I don't think we need to stop the world to get these things fixed.  It is
 the responsibility of the team leads to make sure someone's actively
 working on fixes for regressions.  If they're not getting fixed, it's our
 fault.  We should have one of the team leads pick up the regression and
 assign someone to work on it, just like any other high priority bug.



 On Mon, Jul 14, 2014 at 3:05 PM, Curtis Hovey-Canonical 
 cur...@canonical.com wrote:

 Devel has been broken for weeks because of regressions. We cannot
 release devel. The stable 1.20.0 that we release is actually older
 than it appears because we had to search CI for an older revision that
 worked.

 We have a systemic problem: once a regression is introduced, it blocks
 the release for weeks, and we build on top of the regression. We often
 see many regressions.The regression mutate as people merge more
 branches.

 The current two regressions are:
 * win juju client still broken with unknown
   from  2014-06-27 which has varied as a compilation
   problem or panic during execution.
   https://bugs.launchpad.net/juju-core/+bug/1335328

 * FAIL: managedstorage_test trusty ppc64
   from 2014-06-30 which had a secondary bug that broke compilation.
   https://bugs.launchpad.net/juju-core/+bug/1336089

 I think the problem is engineers are focused on there feature. They
 don't see the fallout from their changes. They may hope the fix will
 arrive soon, and that maybe someone else will fix it.

 I propose a change in policy. When a there is a regression in CI, no
 new branches can be merged except those that link to the blocking bug.
 This will encourage engineers to fix the regression. One way to fix
 the regression is to identify and revert the commit that broken CI.


 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




 --
 Wayne Witzel III
 wayne.wit...@canonical.com

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: RFC: mongo _id fields in the multi-environment juju server world

2014-07-04 Thread William Reade
My expectation is that:

1) We certainly need the environment UUID as a separate field for the shard
key.
2) We *also* need the environment UUID as an _id prefix to keep our
watchers sane.
2a) If we had separate collections per environment, we wouldn't; but AIUI,
scaling mongo by adding collections tends to end badly (I don't have direct
experience here myself; but it does indeed seem that we'd start consuming
namespaces at a pretty terrifying rate, and I'm inclined to trust those who
have done this and failed.)
2b) I'd ordinarily dislike the duplication across the _id and uuid fields,
but there's a clear reason for doing so here, so I'm not going to complain.
I *will* continue to complain about documents that duplicate info across
fields in order to save a few runtime microseconds here and there ;).

If someone with direct experience can chip in reassuringly I *might* be
prepared to back off on the N-collections-per-environment thing, but I'm
certainly not willing to take it so far as to separate the txn logs and
thus discard consistency across environments: I think there will certainly
be references between individual hosted environments and the initial
environment.

So, in short, I think Tim's (1) is the way to go. But *please* don't
duplicate data that doesn't have to be -- the UUID is fine, the name is
not. If we really end up spending a lot of time extracting names from _id
fields we can cache them in the state documents -- but we don't need
redundant copies in the DB, and we *really* don't need to make our lives
harder by giving our data unnecessary opportunities for inconsistency.

Cheers
William



On Fri, Jul 4, 2014 at 6:42 AM, John Meinel j...@arbash-meinel.com wrote:

 According to the mongo docs:
 http://docs.mongodb.org/manual/core/document/#record-documents
 The field name _id is reserved for use as a primary key; its value must
 be unique in the collection, is immutable, and may be of any type other
 than an array.

 That makes it sound like we *could* use an object for the _id field and do
 _id = {env_uuid:, name:}

 Though I thought the purpose of doing something like that is to allow
 efficient sharding in a multi-environment world.

 Looking here: http://docs.mongodb.org/manual/core/sharding-shard-key/
 The shard key must be indexed (which is just fine for us w/ the primary
 _id field or with any other field on the documents), and The index on the
 shard key *cannot* be a *multikey index
 http://docs.mongodb.org/manual/core/index-multikey/#index-type-multikey.*
 I don't really know what that means in the case of wanting to shard based
 on an object instead of a simple string, but it does sound like it might be
 a problem.
 Anyway, for purposes of being *unique* we may need to put environ uuid in
 there, but for the purposes of sharding we could just put it on another
 field and index that field.

 John
 =:-



 On Fri, Jul 4, 2014 at 5:01 AM, Tim Penhey tim.pen...@canonical.com
 wrote:

 Hi folks,

 Very shortly we are going to start on the work to be able to store
 multiple environments within a single mongo database.

 Most of our current entities are stored in the database with their name
 or id fields serialized to bson as the _id field.

 As far as I know (and I may be wrong), if you are adding a document to
 the mongo collection, and you do not specify an _id field, mongo will
 create a unique value for you.

 In our new world, things that used to be unique, like machines,
 services, units etc, are now only unique when paired with the
 environment id.

 It seems we have a number of options here.

 1. change the _id field to be a composed field where it is the
 concatenation of the environment id and the existing id or name field.
 If we do take this approach, I strongly recommend having the fields that
 make up the key be available by themselves elsewhere in the document
 structure.

 2. let mongo create the _id field, and we ensure uniqueness over the
 pair of values with a unique index. One think I am unsure about with
 this approach is how we currently do our insertion checks, where we do a
 document does not exist check.  We wouldn't be able to do this as a
 transaction assertion as it can only check for _id values.  How fast are
 the indices updated?  Can having a unique index for a document work for
 us?  I'm hoping it can if this is the way to go.

 3. use a composite _id field such that the document may start like this:
   { _id: { env_uuid: blah, name: foo}, ...
 This gives the benefit of existence checks, and real names for the _id
 parts.

 Thoughts? Opinions? Recommendations?

 BTW, I think that if we can make 3 work, then it is the best approach.

 Tim

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list

Re: Actions document - suggested changes

2014-06-27 Thread William Reade
On Thu, Jun 26, 2014 at 6:47 AM, Tim Penhey tim.pen...@canonical.com
wrote:

 On reading the spec[1] and looking at the state documents as they are in
 trunk now, I was quickly coming to the conclusion that the documents
 need to change.


I agree with that, but...



 Right now we have two action related documents:
 ...
 I think these should be combined,


Strong -1. We've made this mistake too many times in the past; we need to
group the fields in separate docs according to who needs to read/write
them, so we don't end up with embarrassments like the N^2 add-unit problem.


 and there are quite a few missing
 fields that are clearly expected from the spec:

 invoked:  TIME
 started:  TIME
 finished: TIME
 files:
 - this is a tricky one

 Obviously we are also missing the user that asked for the action.


Agreed. These fields will come when we're ready to use them.

Consider the following structure:

 type actionDoc struct {
  ...
 // UnitName is the name of the unit that the action will
 // run on
 UnitName string


Not necessarily a unit. We need service actions, and probably one day stack
actions (even if they're likely to be represented as service actions).

 ...
 // User is the user that requested this action
 User string


Not necessarily a user. I think there's a class of service-level action
that will only be possible if the leader has a way of invoking actions on
its followers.

 ...
 // ReturnCode is the result of the action
 ReturnCode *int


Not so convinced that this is really what we want, more on this later.

// Results are defined and set by the action themselves.
 Results map[string]interface{}

 // Files is interesting, in that they should probably live
 // in the environment file storage, what was cloud storage
 // and is now GridFS, under some particular directory:
 // perhaps  /actions/uuid/filename
 // Both stdout and stderr are recorded as files.  They are
 // only added if they are not empty.
 // Other files can be added by the action.
 Files []string


I'm concerned that this overlaps unhelpfully with other features. Hook
output is already logged elsewhere, and I think I'd prefer to just make
sure they're badged properly so that they're easily accessible via
debug-log.

Other files, and/or the possibility of multiple output documents, feel like
they're an independent feature. Sorry I didn't flag this before, but that
spec has had some interesting changes since I +1ed it -- in short, I'm
concerned that we're sacrificing orthogonality for the sake of gilding the
actions feature. I think a single output map for the action results will
give us a lot of value, and I anticipate that a separate output feature --
in which we can store structured maps, or binary blobs -- is valuable in
many contexts, not just in actions.


 status becomes an emergent property:
   if Started is nil
 pending
   if Finished is nil
 started
   if ReturnCode is 0 (not nil)
 success
   else
 failure


Up to a point, yes, but I think there's a bit more granularity to status.
There's also the possibility that the unit agent went down mid-action; and
I wouldn't be surprised if we came up with other edge cases. Basically I
think we really do want Status instead of ReturnCode; while we could
plausibly infer pending/started statuses from other fields, but I'm not
sure it's *that* much of a win -- we may as well set started when we set
the started time, and then we can just get the status by looking without
having to infer anything. (Although we would have to infer pending from the
lack of an ActionResult anyway.)

Queues, Lists, and Results are just queries across this document set for
 the environment, optionally scoped to unit names and users.


Not sure how to watch them. Remember that we can't really do db queries
inside watches, because any blocked watcher blocks the whole watching
infrastructure. This is not to say that we *can't* get around it, but it's
a bunch of complexity for relatively little benefit IMO; and people
generally find the watchers hard enough to deal with without adding
additional layers.

Does that seem reasonable to other people?


Some of it does, yeah. My overriding concern is about just sticking
everything in the same doc just because they're conceptually related: I
really don't think it's a good idea.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Relation addresses

2014-06-19 Thread William Reade
On Thu, Jun 19, 2014 at 12:16 AM, Kapil Thangavelu 
kapil.thangav...@canonical.com wrote:

 On Wed, Jun 18, 2014 at 5:04 PM, William Reade 
 william.re...@canonical.com wrote:

 We're not changing any semantics; apart from anything else,
 relation-broken does not depend on any remote unit.


 its true it doesn't depend on one related unit, because its a change about
 all of them. ie. relation-broken is an event coalesce that effectively for
 all remote units departed and are never coming back,  a relation-hook
 executing for a self/local unit change as opposed to remote still feels
 like a different semantic.


Used to be; hasn't been for a couple of years now. I'm not sure it'd be a
bad thing to reinstate the pyjuju behaviour, but I think that's orthogonal
regardless.

And, in a networked world, the unit does not necessarily have a single
 private address: so it's not a unit-scoped event. If we try to pretend it
 is, we avoid a small amount of confusion now at the cost of much more
 confusion in the imminent future.



 so what happens when i have an address change on a unit  that's not in a
 network scoped relation on that interface, the unit never gets informed of
 the address change? what if i have an unscoped relation that was casually
 using that interface?  ie. not having a single private address to me is why
 its a unit scope change. It still feels like the issue is still we're
 mixing future unimplemented features with current bug fixes.


Not sure I follow: every relation is implicitly scoped to the network its
private-address is on anyway. There's only the one private network today,
so it's not immediately obvious, but it's still true. If you're not in a
relation, yes, you won't get informed by a relation-address-changed.

But with the data-bag write alone, the unit never gets informed anyway. If
it's lucky it runs some unrelated hook and thinks to check, but that's
pretty weak: our model is that we *tell* units about the changes they need
to respond to. For a sane long-term solution, this needs to be tied to a
hook. I'm potentially open to arguments that address changes on interfaces
not used by relations should also generate notifications... but (1) that's
still a hook, even if it's a pre-existing one and (2) my gut says that a
charm using such interfaces is playing silly games anyway, and I'm not sure
we should go out of our way to accommodate it.

And: yes, I'd love to find a clean fix that doesn't hamper future plans,
but I'm not sure that's what we're getting.


 thanks, that scenario would be useful to have in the spec doc. As long as
 we're talking about unimplemented features guiding current bug fixes,
 realistically there's quite a lot of software that only knows how to listen
 on one address, so for network scoped relations to be more than advisory
 would also need juju to perform some form of nftables/iptables mgmt. Its
 feels a bit slippery that we'd be exposing the user to new concepts and
 features that are half-finished and not backwards-compatible for proxy
 charms as part of a imo critical bug fix.


 I understand this is an important bug fix. But it's not a *simple* bug
 fix; it's intimately bound up with the changes imposed by the improvements
 to the network model.


 its a bit unclear if that's the case... could you elaborate?   else how
 are we populating the value in the first place? and what makes it hard to
 update that value when we're already recording the updates/changes into
 state?


OK: the state server discovers that a machine's address has changed. We can
find all units on that machine, and all their relation data bags; we can
find the old private-address values that match, and we can overwrite them.
That's simple in isolation -- and in fact it actually stays simple in the
networked relation case, because it just means we have fewer bags we need
to check/update. So I was wrong there, it's not specifically about the
networks; and that's good. But.

The trouble comes when we have to integrate it with what the unit agent's
doing: we can't assume that the unit isn't running a hook, and if it *is*
then when that hook completes it's going to overwrite its data bag, quite
likely blowing away the changes made on the state server. The only way I
can see to work around this reliably is to eliminate the race by making the
unit agent responsible for the update in the first place. (I accept that if
the units don't overwrite unchanged values we'll *usually* be ok. But once
we open the door to concurrent relation settings writes I think that
usually isn't good enough.)

And if we do that we *still* have issues: the unit agent needs to
persistently record what it last thought its address was, so that when it
comes up after reboot and sees a new address it knows it has a new one, and
knows what the old one was so it can only overwrite the correct ones; and
if the address changes while it's running it needs to be notified by the
state server, which means a watcher; and before

Re: Relation addresses

2014-06-19 Thread William Reade
On Thu, Jun 19, 2014 at 5:14 AM, Andrew Wilkins 
andrew.wilk...@canonical.com wrote:

 If that's the case, why do we not just require charms to implement an
 address-changed hook to update the relation setting then?
 That way we don't break existing proxy charms, but other charms won't be
 fixed until they implement that hook. We'd have to keep the initial
 private-address value for backwards compatibility at least initially.

 So how about this as an alternative proposal:
  - Add relation-address-changed, but don't add automatic updating of
 private-address (after initial setting).
  - Continue to use unit-get private-address. When we have networks, then
 we can add -r and have it default to the current relation.

 Pros:
  - Nothing to do on upgrade, no messy error-prone logic
  - No network specific bits added until they're added
  - No charms will be broken any more than they already are
 Cons:
  - Existing non-proxy charms will need to be changed to take advantage of
 the fix


Still not ideal, but model-wise it'd work for me. Also covers the odd weird
case where an interface uses host instead of private-address. Kapil?
Charmers? I understand that the extra work is unattractive, but I really
don't want to depend on racy scary magic, and I can't shake the feeling
that automatic rewrites will always take us there.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Relation addresses

2014-06-18 Thread William Reade
On Tue, Jun 17, 2014 at 5:35 PM, Kapil Thangavelu 
kapil.thangav...@canonical.com wrote:

 or another unit level change hook (unit-address-changed), again the
 concerns are that we're changing the semantics of relation hooks to
 something fundamentally different for this one case (every other relation
 hook is called for a remote unit) and that we're doing potentially
 redundant event expansion and hook queuing as opposed to
 coalescing/executing the address set change directly at the unit scope
 level.


We're not changing any semantics; apart from anything else, relation-broken
does not depend on any remote unit.

And, in a networked world, the unit does not necessarily have a single
private address: so it's not a unit-scoped event. If we try to pretend it
is, we avoid a small amount of confusion now at the cost of much more
confusion in the imminent future.

The reason it is called for each associated unit is because the network
 model means we can actually have different addresses (be connected on a
 different network) for different things related to me.

 e.g. I have a postgres charm related to application on network A, but
 related to my-statistics-aggregator on network B. The address it needs to
 give to application should be different than the address given to
 my-statistics-aggregator. And, I believe, the config in pg_hba.conf would
 actually be different.


 thanks, that scenario would be useful to have in the spec doc. As long as
 we're talking about unimplemented features guiding current bug fixes,
 realistically there's quite a lot of software that only knows how to listen
 on one address, so for network scoped relations to be more than advisory
 would also need juju to perform some form of nftables/iptables mgmt. Its
 feels a bit slippery that we'd be exposing the user to new concepts and
 features that are half-finished and not backwards-compatible for proxy
 charms as part of a imo critical bug fix.


I understand this is an important bug fix. But it's not a *simple* bug fix;
it's intimately bound up with the changes imposed by the improvements to
the network model. If everybody's just binding to 0.0.0.0 (which I accept
they probably are, and will perhaps be for a long time) it still does us,
and them, no harm to report the address they *should* bind to. And for the
software that can do better, it does no harm for them to bind to the
correct addresses before the model is fully in place.

there's lots of other implementation complexity in juju that we don't leak,
 we just try to present a simple interface to it. we'd be breaking existing
 proxy charms if we update the values out from the changed values. The
 simple basis of update being you touched you own it and if you didn't it
 updates, is simple, explicit, and backwards compatible imo.


The trouble is that it's tricky to track whether or not it was touched --
especially for upgraded environments. I'd rather impose some well-flagged
breakage on the proxies -- a minority of charms -- than introduce a complex
and subtly-flaky mechanism (that will only fly for a little while) on all
charms.

There's also the question of why the other new hook (relation-created) is
 needed or how it relates to this functionality, or why the existing
 unit-get private-address needs to be supplemented by address-get.


relation-created is there because I'm concerned that
relation-address-changed will otherwise become the de facto
relation-created hook -- as it is there's no place for one-time relation
setup, and people currently *have* to do it in the first relation-joined.
Giving them another option that isn't quite right will only make it harder
to figure out what's going on in a simple charm.

WRT unit-get: `unit-get private-address` can either grow a -r flag (with an
implicit value in relation hooks), and thus sometimes change its meaning in
existing charms; or we can introduce a new tool whose meaning is always
clear. It seems friendlier to preserve the existing behaviour of unit-get.

In general, I think that *some* degree of hook-proliferation is not such a
bad thing, but I understand that it could be taken too far. The strongest
argument against rel-addr-changed that I can see is that there may be other
relation-scoped changes that would fit, and that we'd want a single hook --
as in the leader discussion, in which there seemed to be general agreement
that leader-changed could be rolled into config-changed. In particular,
then, the mooted relation-config-changed hook would perhaps suffice.
Thoughts?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Relation addresses

2014-06-18 Thread William Reade
On Wed, Jun 18, 2014 at 7:05 PM, Kapil Thangavelu 
kapil.thangav...@canonical.com wrote:


 addresses are just keys in a unit relation data bag. relation-get is the
 cli tool to retrieve either self or related units databag key/values (ie
 for self's address in the rel $ relation-get $JUJU_UNIT_NAME
 private-address). unit-get is used to retrieve the current iaas properties
 of a unit.


Yes: unit-get retrieves iaas properties; and relation-get retrieves unit
properties; but self's private-address is *not* put in self's relation data
bag for the benefit of self; it's for the remote units that *react* to
changes in that data bag. Using `relation-get $JUJU_UNIT_NAME
private-address` is Doing It Wrong: the canonical way to get that data is
`unit-get private-address`, and the problem is not that we don't magically
update the relation data bag: the problem is that we don't provide a means
to know when the relation's data bag should be updated.

Honestly, it's kinda bad that we prepopulate private-address *anyway*. It's
helpful in the majority of cases, but it's straight-up wrong for proxy
charms. I don't want to take on the churn caused by reversing that
decision; but equally I don't want to fix it with magical rewrites of the
original magic writes.

my point regarding binding and addresses was more that we're forward
 thinking bug fixes by introducing a bunch of user facing stuff without
 having completely thought/designed or started implementation on the
 proposed solution that is the reason we're exposing additional things to
 users. instead i'd rather we just fix the bug, and actually implement the
 feature when we get around to implementing the feature.  By the time we get
 around to implementing it (which for this cycle is against a single
 provider) we may have a different implementation and end-user exposed
 surface in mind.


That's not impossible; but I don't think it's a good reason to pick an
approach at odds with our current best judgment of where we're heading.

moreover the  user facing (charm author) aspects of the changes as
 currently in the spec are going to be confusing, ie. relation hooks are
 always called for remote units, except for this one case which is special.


I don't agree that this one case is special; relation hooks are called in
response to changes in a local unit's view of a relation, and those changes
have hitherto *mostly* been about remote units. But I don't think it's
fundamental -- and I'm not even sure it's very natural, I've corrected
misconceptions about -joined referring to the local unit more than once.

additionally i'd prefer we have a plan for maintaining backwards
 compatibilities with proxy charms that are already extant.


OK, let's talk about that :). Would a charm feature flag work for you? It's
become pretty clear to me that we need them anyway; most charms could
switch it on without issues, and proxy charms would be free to update on
their own schedules.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: state/api.State authTag vs tag

2014-06-17 Thread William Reade
On Tue, Jun 17, 2014 at 10:04 AM, roger peppe rogpe...@gmail.com wrote:

 On 16 June 2014 09:25, William Reade william.re...@canonical.com wrote:
  On Sun, Jun 15, 2014 at 2:58 PM, John Meinel j...@arbash-meinel.com
 wrote:
 
  I feel like we should consolidate these fields. And if we need authTag
  to match Login then we should be setting tag there instead. (That
 will be
  better for any case where we Login late, given that we probably still
 would
  want to be able to use anything like DebugLog URLs.)
 
  They currently match but are not guaranteed to; in particular, when we
  consolidate the agents such that a machine agent is running the uniters
  deployed on that machine, they will definitely not match.

 I don't understand this. Surely if a machine agent is running the uniters
 deployed on that machine, it will still log in as itself (the machine
 agent)
 and leave it up to the API server to allow that agent the authority
 to speak for those unit agents?


Yeah, but the (only, I think) usage of the authTag field is to specialise
relation calls by interested unit. We'll certainly be authed as the machine
agent, but the current form of api/uniter.State requires the unit tag, so
we will need some fix; whether that fix is a change to the relation
methods, or something else, is not especially interesting to me right now.

I agree that that the authTag and tag fields are mutually redundant.
 I think we should just delete tag,  and make both Open and Login save
 authTag and the password (authTag is a somewhat more self-explanatory
 name than tag IMHO).


So long as we're agreed that we only need one field, I think the choice of
name can be left to the implementer and tweaked in review if necessary.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: state/api.State authTag vs tag

2014-06-16 Thread William Reade
On Sun, Jun 15, 2014 at 2:58 PM, John Meinel j...@arbash-meinel.com wrote:

 I feel like we should consolidate these fields. And if we need authTag
 to match Login then we should be setting tag there instead. (That will be
 better for any case where we Login late, given that we probably still would
 want to be able to use anything like DebugLog URLs.)


They currently match but are not guaranteed to; in particular, when we
consolidate the agents such that a machine agent is running the uniters
deployed on that machine, they will definitely not match.

I'm guessing this is just a case of 2 different people working on the code
 and not realizing the data was available somewhere else, but I figured I'd
 run it by people in case I didn't understand what the use case was.


However, I don't think we have a good reason to keep authTag around at the
moment. There'll be some degree of churn when we do the consolidation, and
authTag will itself be wrong; I fear that having it around will nudge us
into a worse solution than we'd end up with if we just handled it when we
needed to.

Anyone have a reason *not* to drop it?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Availability zone support

2014-06-13 Thread William Reade
On Fri, Jun 13, 2014 at 6:23 AM, Andrew Wilkins 
andrew.wilk...@canonical.com wrote:

 Hi all,

 I'm pleased to say that availability zone support for AWS and OpenStack
 has now landed on trunk, and will be available in 1.19.4. I'll write a blog
 post and maybe even some docs about it, but for now here's a brief
 description how you can use this feature.


Thanks Andrew, this is awesome.

Just to clarify the situation a bit further:

  * the azure provider already distributes instances for availability (but
doesn't use the AZ concept, and so won't expose a `zone=` placement
directive)
  * the maas provider doesn't, yet, but this is forthcoming imminently
  * the joyent provider doesn't, and certainly won't until joyent supports
AZs
  * the local and manual providers don't, and probably won't ever

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: $ juju switch --format

2014-06-06 Thread William Reade
On Fri, Jun 6, 2014 at 9:49 AM, Nick Veitch nick.vei...@canonical.com
wrote:

 sounds much better. Is there some way of throwing in whether HA is
 enabled in there too?


I'm not sure that's very closely related to the purpose of the command. HA
status is variable, and not a useful differentiator between environments; I
think that every piece of information that doesn't function to identify a
particular user/environment is suspect in this context.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Sharing environments - a proposal

2014-05-30 Thread William Reade
I don't think there is any reason to drop ca-cert from the .jenv for a
singular environment; when you're first connecting to any state server you
certainly need verification that it's really who it says it is. The exact
source of the trust depends on the scenario, for sure, but in the singular
state-server case a .jenv is the simple sane way to do it IMO.


On Fri, May 30, 2014 at 9:32 AM, roger peppe roger.pe...@canonical.com
wrote:

 On 30 May 2014 06:50, John Meinel j...@arbash-meinel.com wrote:
  ...
 
 
   PROBLEM: right now all connections to the api server are secured with
   TLS and the client-cert.
 
  As John says, this isn't actually true - connections are secured with
  a server cert and a password.
 
  Unfortunately I believe it is impossible to lose either one of these
  without rendering juju fundamentally insecure against man-in-the-middle
  attacks.
 
  If we take the approach you suggest, that's what we'd end up with.
 Anyone
  that can subvert the network between the juju connect command and the
  API server could pretend to be the desired environment, forwarding and
  changing requests as maliciously as it liked. There's no way that the
  client can know that it's talking to the middle-man, and no way for the
  server to know that it's not being addressed by the expected client.
 
  There is also the problem that the endpoint can change - with HA the
  actual API addresses can and will change (and there are many
  possible addresses too - we try to connect to all of them; that's
  not very convenient for a small piece of information to copy
  and paste)
 
 
  So we could certainly make it safe once you have securely connected 1
 time.
  In that we can ask what the CA cert is for this environment, and then
 make
  sure all future connections are validated with that CA.

 Yes. You have to work out how you're going to connect securely that
 first time though. How do you propose to do that?

   cheers,
 rog.

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Proposed new dependency: github.com/juju/errors (and github.com/juju/errgo)

2014-05-29 Thread William Reade
Masking is often necessary, and information hiding is valuable, and
depending on spooky action at a distance is generally foolish. But the
granularity with which we want to track the propagation of a given error is
*much* finer than the granularity of judicious information hiding: the
error is likely to pass through a lot more stack frames than it is package
boundaries, and the former transitions are the ones we need to track to
properly diagnose errors [0], while the latter are the ones we need to mask
to properly maintain global sanity [1].

To mask by default, at every annotation point, leads quickly to code whose
action is characterised by bizarre and mystifying errors [2]. To *fail* to
mask at sensible boundaries also leads to bugs, and we've seen one or two
of those as well; the trouble is that masking is *always easy*, while pure
annotation is (relatively) hard. To use a library that makes a point of
keeping preservation comparatively verbose and complex is to fundamentally
misunderstand the problem.

Cheers
William

[0] it's not a perfect one-to-one match, but it's a reasonable heuristic.

[1] it's not a perfect one-to-one match, but it's a reasonable heuristic.

[2] this because developers are forced to choose between obscuring their
logic with verbose workarounds for the masking... or to just avoid
annotation entirely, and hence come to habitually avoid masking even when
it's necessary. My lived experience indicates that they will overwhelmingly
choose the latter, and that the consequences are unpretty.


On Thu, May 29, 2014 at 9:16 PM, Nate Finch nate.fi...@canonical.comwrote:

 What Roger is talking about, is when an error is implementation-specific
 and may change if the implementation changes.

 For example, if you have a black box service that saves data, it might
 write to the network or it might write to a filesystem, but where it saves
 is an implementation detail.  Let's say in the current implementation,
 Save() returns a FileNotFoundError if you haven't called Initialize()
 first, then people may write code that handles a FileNotFoundError by
 calling Initialize() now if you change the implementation to write to
 the network, so now if they haven't called Initialize(), you return a
 NetworkNotFoundError in the same situation their code will break.   If
 you had instead written your own error, or just masked the type of the
 error to something generic, then they wouldn't have been able to write
 their code that makes bad assumptions.

 Note that this is the most simple case of the problem.  In reality, it can
 be that some sub-sub-sub package was returning that specific error, and
 your code had no idea, it just passed it on up unchanged.  This is where it
 gets insidious, because *your* code didn't change.  Some dependency of a
 dependency changed that you might not even realize you were calling.  And
 now people using your code are broken.  If you masked all returned errors
 except the ones you explicitly wanted to return, then this couldn't happen.






 On Thu, May 29, 2014 at 3:00 PM, Jeroen Vermeulen 
 jeroen.vermeu...@canonical.com wrote:

 On 2014-05-29 09:50, roger peppe wrote:

 On 29 May 2014 04:03, Tim Penhey tim.pen...@canonical.com wrote:


  Errors are worth treating specially here because they're
 they pass up through multiple layers, so it's very easy to break
 abstraction
 boundaries by relying on specific error types. I believe it's very
 important
 to *think* about the possible errors that can be returned from a given
 function, and not just throw up our hands and say you guys 5 layers
 down,
 why don't you just communicate *arbitrary stuff* to the guy 3 layers
 above.


 It may help to consider this as two problems.


 One: caller needs to know about a specific failure — e.g. because it's a
 failure to the callee but not to the caller.  Definitely part of the
 contract.  You either:

 (a) define a super-specific error (exception class, error code, etc.), or

 (b) document that standard error X means failure Y in this case, and the
 caller picks up the error as close as possible to its origin.

 With these errors you have to make sure that the information isn't
 diluted as it propagates, but usually you don't have to take it too far up
 the call chain.


 Two: a caller can deal better with some errors, given more detailed
 information.  You can help by attaching more information to the error
 (tags, taxonomy, properties) but only on a best-effort basis.  You accept
 that you don't know exactly which errors can come out of the code further
 down.

 For example, if you're writing code which speaks to another program over
 the network, you may want to know: is this connection still usable?  Do I
 know what happened to the operation I requested?  Were we able to connect
 in the first place?  Am I doing something that shouldn't ever work, such as
 mis-spell a command?

 With these errors you act on the best information you have, but you can
 always just fail.



Re: Proposed new dependency: github.com/juju/errors (and github.com/juju/errgo)

2014-05-27 Thread William Reade
Roger

We have had many long discussions, and you have provided much valuable
input: in particular, explaining to us that we were completely off-base re:
the attempts to use a slice vs a linked list in the error stacks. And we
did indeed agree to use errgo (as the only proposed implementation that Did
It Right in that regard). And you said you'd integrate it, and we agreed an
approach that I could live with, despite my misgivings re the interface and
the default behaviour; but, for a variety of perfectly good and excusable
reasons, this has still not happened after many months.

However, the conditions under which I agreed to use errgo were rather more
stringent than I feel you have represented. In particular, you did not
convince me that discard-by-default is a sane approach; you agreed to
preserve error types by default, in the hope of empirically convincing me
that, as the codebase evolved, we would end up annotating overwhelmingly at
boundaries where discarding made sense. You observe that this approach fits
well with current practice, and that's fine, but it fails to take into
account the fact that our error handling is currently *really bad*.

And this is, IMNSHO, *because* our original annotation practices discard
types (unless you write tedious verbose site-specific custom code); and so
we inevitably annotate only where we *have* to, lest we throw away useful
information; and so it's really hard to actually figure out what really
happened in a given error situation -- we end up having to grep the source,
and even when that works unambiguously it still *sucks*. Observing that
errgo is a flawless drop-in replacement for our historical practices is a
point *against* its adoption, not *for* it.

Still: we *are* leveraging errgo, because it does a lot of things right. As
Tim said, the fact that we can implement what we want in errgo is a strong
endorsement of its quality and flexibility. But the philosophy underlying
errgo's interface does not play nicely with the direction I want to take
the project [0], while the arrar-style interface does, and so we wrap errgo
such that it fits our needs (which Tim has already clearly described).

I don't want to appear to annex ownership of errgo; it's your package, and
it has a cool name, and I wish it popularity and widespread adoption. If
you feel that juju/errgo makes for a counterproductive canonical location,
we can happily thirdparty it in its current state; and you can write your
own error handling code outside juju, using an errgo outside juju, as you
wish. We could even rename our version if you want [1]. But I will not
stand for further delay and confusion and the associated fossilization of
poor practices in the codebase for which I bear responsibility.

Re unsafe: where we can't check equality, we fall back on identity. We know
it's a hack; and if it's fundamentally broken, we'd like to know why, so we
can fix it. Please clarify your objections.

Cheers
William

[0] ...which I am not interested in rehashing. My considered opinion is
that, while both approaches have potential pathological failure situations,
my experience across projects indicates that the ones that come up in
practice overwhelmingly favour discarding error types only when we're sure
it's a good idea to do so; writing code to explicitly preserve specific
errors at specific times will be a perpetual drag on clean refactoring, and
will rapidly lead to an accumulation of meaningless cruft that nobody has
the courage to remove just in case some distant module still depends upon
it.

[1] Maybe we could call it arrar...


On Tue, May 27, 2014 at 12:04 PM, roger peppe roger.pe...@canonical.comwrote:

 On 27 May 2014 04:02, Tim Penhey tim.pen...@canonical.com wrote:
  On 23/05/14 02:47, roger peppe wrote:
  Any particular reason to wrap the errgo functions rather than using
 errgo
  directly? Personally, I see the role of the errors package to be about
  specific error classifications (something that errgo tries hard to be
 entirely
  agnostic about), but errgo could, and I believe should, be used
 directly for all
  the more general calls.
 
  Sure.  There were several reasons that kept me away from using errgo
  directly or modifying errgo.
 
  1) Probably the most important, I don't feel that errgo is a 'juju'
  library but very much a personal library of yours, one that you have
  very strong feelings about, and I feel that proposing changes that
  fundamentally change how it behaves will get bogged down in
  bike-shedding and disagreement.  I was wanting to get the behaviour that
  we wanted, and I found a way to get that behaviour without touching
  errgo, so I went that way.

 errgo is now under github.com/juju precisely because I wanted to bring it
 into
 the juju fold - it's there *for* juju and it should do what juju needs.

 It's true that I have definite preferences for how some of the behaviour
 works, because I have thought a lot about the emergent effects of
 the way we handle 

Re: FullStatus API when machine or unit is StatusDown

2014-05-19 Thread William Reade
On Mon, May 19, 2014 at 3:55 AM, Menno Smits menno.sm...@canonical.comwrote:



 What I'd like to do is not have the client API munge the status info
 string and instead use the StatusData map to report the record the original
 status. When a unit or machine status is StatusDown a client can look in
 StatusData for the pre-agent-down status if it wants it.


+1 to this, but the new server still needs to generate the old data, so
that old clients can keep using the existing (agreed crazy) implementation;
new clients can fall back to outputting that when talking to an old server;
and a new client talking to a new server can build user output nicely out
of usefully orthogonal data.

It would also be good to move the if agent down set the status to
 StatusDown logic to somewhere deeper so that all clients get a consistent
 view of the status. I can live with it if this isn't feasible however -
 this part isn't necessary to get the ticket done.


Integrating this into the AllWatcher will be ...challenging, and you
shouldn't let it block progress on this bug. The least crazy way to do this
-- I think -- would be to add a presence worker that actively updates the
status data as it changes.

I'm concerned that this change may impact the GUI. As far as I can tell the
 GUI doesn't call FullStatus so any changes to that API shouldn't affect it
 (please correct me if I'm wrong). Changing how StatusDown is set however
 will probably impact the GUI.


I don't think the GUI ever sees presence information. That's a bug in
itself, but it doesn't affect the situation here; but it would again be
good if we were to keep presence information in StatusData, because then
the GUI would start to see those changes and could make use of them at
their leisure.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Actions

2014-05-15 Thread William Reade
Actions will not be using SSH to get to the machines; they'll be integrated
into the model. We hope to retire the direct-SSH implementation of juju run
so as to allow the same functionality via an implicit action, but we're not
worrying about that until we have the actions infrastructure in place.

Independent of juju run, I wouldn't *encourage* the use of arbitrary code,
but I can't see any sane way to prevent charm authors from defining an
action that accepted arbitrary code as a parameter -- and it's indeed
likely that we'll end up considering juju run to be literally an action
that accepts arbitrary code. We will of course need to exercise care wrt
permissions: the ability to execute juju run is clearly essentially
equivalent to the ability to ssh and hence execute arbitrary code.

Cheers
William


On Thu, May 15, 2014 at 1:13 AM, Mike Sam mikesam...@gmail.com wrote:

 I know juju run is out but I am wondering when we should expect Actions
 that do not require ssh access to the machines?

 Also, will actions allow arbitrary (defined at runtime) hook code or show
 the code be part of the charm already?

 Thanks,
 Mike

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: test-failure bugs

2014-05-15 Thread William Reade
Conversely, the occasional still seeing this at r comment is also
helpful.


On Thu, May 15, 2014 at 2:35 PM, Andrew Wilkins 
andrew.wilk...@canonical.com wrote:

 Hi all,

 This past couple of weeks Ian and I have been working on fixing tests so
 we're not constantly poking the bot.

 Several of the test-failure bugs I've looked at over the last couple of
 days have already been fixed, but it took a while to figure that out. It
 would be a great help if, when test-failure bugs are logged, the rev you're
 on is recorded in the bug.

 Cheers,
 Andrew

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: arch constraint default changed?

2014-05-13 Thread William Reade
Strongly agree with gustavo/henning. In *all* cases, the possibilities are
defined by the arches of the available tools, images, and instance types.
When using --upload-tools, the arches of the available tools are further
restricted, and may thus force i386, but that should have no impact
whatsoever on our method for choosing amongst the available options.

I remember writing a byArch sort that prioritised amd64; it seems to have
disappeared at some point in the last year, as the tools/instance-types/etc
code evolved, but this was always intended behaviour; please reinstate it.


On Tue, May 13, 2014 at 9:15 AM, Henning Eggers henn...@keeeb.com wrote:

 Although I don't know about --upload-tools, I have to agree with Gustavo
 here
 that selecting the instance arch depending on the workstation arch is
 unintuitive from a user's perspective. I would not expect that at all.

 Yes, amd64 is a very sensible default. I would wish that it stayed that
 way.

 Henning

 Am 12.05.2014 19:58, schrieb Gustavo Niemeyer:
  Why isn't the default tweaked by --upload-tools itself then?  We
  should be optimizing these options for users, rather than for
  developers, and it sounds sensible to assume that the vast majority of
  users do want to deploy on amd64 rather than i386 or arm.
 
  On Mon, May 12, 2014 at 2:53 PM, Nate Finch nate.fi...@canonical.com
 wrote:
  However, the fix is slightly different than just always choose amd64.
  Instead, we always choose the same architecture as the client machine,
 that
  way if the user uses --upload-tools, the tools will actually work on the
  cloud machine.
 
  This means that if you're running i386, you would still need --arch
 amd64 to
  get amd64 machines in the cloud.
 


 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: arch constraint default changed?

2014-05-13 Thread William Reade
(for pedantry's sake: constraints are not really the issue here. They're
just a mechanism for filtering the acceptable set of results for a
provisioning decision; and they handy from a dev perspective in that they
allow us also to filter the inputs and cut down on the range of
possibilities we have to choose from; but there is no default arch
constraint, other than I don't care.)


On Tue, May 13, 2014 at 10:33 AM, William Reade william.re...@canonical.com
 wrote:

 Strongly agree with gustavo/henning. In *all* cases, the possibilities are
 defined by the arches of the available tools, images, and instance types.
 When using --upload-tools, the arches of the available tools are further
 restricted, and may thus force i386, but that should have no impact
 whatsoever on our method for choosing amongst the available options.

 I remember writing a byArch sort that prioritised amd64; it seems to have
 disappeared at some point in the last year, as the tools/instance-types/etc
 code evolved, but this was always intended behaviour; please reinstate it.


 On Tue, May 13, 2014 at 9:15 AM, Henning Eggers henn...@keeeb.com wrote:

 Although I don't know about --upload-tools, I have to agree with Gustavo
 here
 that selecting the instance arch depending on the workstation arch is
 unintuitive from a user's perspective. I would not expect that at all.

 Yes, amd64 is a very sensible default. I would wish that it stayed that
 way.

 Henning

 Am 12.05.2014 19:58, schrieb Gustavo Niemeyer:
  Why isn't the default tweaked by --upload-tools itself then?  We
  should be optimizing these options for users, rather than for
  developers, and it sounds sensible to assume that the vast majority of
  users do want to deploy on amd64 rather than i386 or arm.
 
  On Mon, May 12, 2014 at 2:53 PM, Nate Finch nate.fi...@canonical.com
 wrote:
  However, the fix is slightly different than just always choose amd64.
  Instead, we always choose the same architecture as the client machine,
 that
  way if the user uses --upload-tools, the tools will actually work on
 the
  cloud machine.
 
  This means that if you're running i386, you would still need --arch
 amd64 to
  get amd64 machines in the cloud.
 


 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju



-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: arch constraint default changed?

2014-05-13 Thread William Reade
We shouldn't ever have to worry about whether or not --upload-tools was
used, because it's *already* been used at the point where we pick
instances, and the single possible arch is thus already chosen, entirely
independent of constraints or anything else. The only question should be:
given *multiple* possible architectures to launch, with nothing else to
decide between them, which do we pick?

The original algorithm was amd64 if available; if not, first
alphabetical. I still think that'd be better than what we have, but as our
options expand I think I'd prefer to go with first alphabetical 64-bit
arch, falling back to first alphabetical. Dissent?

Cheers
William


On Tue, May 13, 2014 at 3:59 PM, Nate Finch nate.fi...@canonical.comwrote:

 For what it's worth, I agree with everyone.  John and I discussed it, and
 I thought we had decided that we needed to use the local arch because of
 upload tools, evidently John though we'd decided in the other direction.
  And Gustavo is right that we should have pushed the discussion to the
 mailing list regardless.

 I didn't want the local arch have any influence on the arch we pick,
 unless the user uses --upload-tools, because as Gustavo said, where I'm
 sitting right now shouldn't affect what architecture my cloud uses.

 Honestly, the reason I didn't code the fix to take --upload-tools into
 consideration is because that was going to be more complicated and I didn't
 have time for it (the fix I made was tiny and quick).  Perhaps that was a
 misjudgment, and perhaps if we had moved the question to the mailing list
 it would have become obvious the time would be worth it.

 If people think it's worth it, we can just go ahead and make the right fix
 to use the local arch only when we use --upload-tools, but it still doesn't
 help us with the problem of which one we pick if they don't use upload
 tools, and multiple arches are available.  What logic would people
 recommend?  I don't think alphabetical is really the best choice, though at
 least it's deterministic, unlike take whatever happens to be listed first
 which is what we seemed to be doing before.


 On Tue, May 13, 2014 at 8:17 AM, Gustavo Niemeyer gust...@niemeyer.netwrote:

 On Tue, May 13, 2014 at 8:18 AM, John Meinel j...@arbash-meinel.com
 wrote:
  FWIW, I've gotten bug requests from a user that did a regular bootstrap
 and
  then was trying to juju upgrade-juju --upload-tools and was confused
 that
  his local machine wasn't able to upgrade tools for his environment. (he
 was
  running i386 locally, and bootstrap created an amd64 machine).
  And while we desperately want to not expose --upload-tools in its
 current
  incarnation, we haven't given an alternative for those use cases.

 There's nothing wrong with tweaking constraints if the application
 sees the --upload-tools option. At the same time, getting a bug from a
 user is not enough reason to tweak the defaults for everybody.

  Also, while we have a reasonably clear model for if you have the choice
  between amd64 and i386 pick amd64, I don't think people disagree with
 that.
  But what if you have the choice between amd64 and ppc64le and the
 client is
  running on ppc64le? Is it likely that they are actually thinking in a
 ppc
  world?

 I don't see how that logic holds. Using an arm laptop as a client does
 not imply I want to use all my server deployments on arm. The same
 holds true for the operating system, or to the Ubuntu series, or to
 pretty much anything else: I would not expect the tool to deploy a
 completely different environment based on where I'm sitting this
 second.

  We certainly had a lot of discussions around this between Nate and
 myself,
  and while I think I was reasonably convinced that we could just pick
 amd64
  if it was an option, it seems he was also reasonably convinced that we'd
  want to use the client arch if available. (He wrote it as just pick
 amd64,
  I argued against it but ended up feeling it was a reasonable pick among
  alternatives, but then he changed it before landing.)

 If I had the chance, I would submit these kinds of decisions to the
 development mailing list, rather than arbitrating it back and forth in
 isolation like that. This is changing the default behavior for
 everybody, so sending it for the team's appraisal feels cheap.


 gustavo @ http://niemeyer.net

 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju



 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju


-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: Preferred wrapping for long function declarations

2014-05-13 Thread William Reade
I personally favour a variant of (3), without necessarily requiring that
every param get its own line: ie

func (c *CustomCaller) MethodCaller(
rootName string, version int, methodName string,
) (
rpcreflect.MethodCaller, error,
) {
...
}

...so that there's clear visual distinction between receiver/method-name,
args, and results; but we don't burn vertical space that doesn't support
that goal.

My personal opinion is that *of course* my preference is obviously correct;
but I really have no interest in *mandating* any particular style. I'm +1
on davecheney's best judgment approach in general: developers make their
code as clean as they can; reviewers flag problems they perceive; both are
responsible for quickly and reasonably resolving disagreements. If you
can't do that, come and see me and I'll demand that you do what I suggest
above :).

Cheers
William


On Tue, May 13, 2014 at 7:51 AM, David Cheney david.che...@canonical.comwrote:

 I don't want to have this bikeshed - I vote for people to use their
 best judgement and permit anything which fits through gofmt.

 On Tue, May 13, 2014 at 3:37 PM, John Meinel j...@arbash-meinel.com
 wrote:
  In the risk of running a bikeshedding paint-fest, I'm curious what the
  best-recommended practice is for wrapping function declarations that get
 a
  bit wide.
 
  Currently I'm writing a function that looks like this:
  func (c *CustomCaller) MethodCaller(rootName string, version int,
 methodName
  string) (rpcreflect.MethodCaller, error) {
  }
 
  But the line is about 119 characters, which isn't terrible, but is a bit
  wide.
 
  Should we:
 
  Not worry about it, you don't usually need to paste into an email and
 have
  crummy wrapping anyway, and 80-character terminals are useless. (and who
  would want to actually look at more than one document side-by-side
 anyway)
  Wrap everything to a single indent:
  This can be a slightly shallow:
  func (c *CustomCaller) MethodCaller(
  rootName string,
  version int,
  methodName string) (
  rpcreflect.MethodCaller,
  error) {
  }
  or go all the way with:
 
  func (c *CustomCaller) MethodCaller(
  rootName string,
  version int,
  methodName string,
  ) (
  rpcreflect.MethodCaller,
  error,
  ) {
  }
 
  (note that gofmt forces the )( and ){ to be left aligned).
  Of the two here I probably prefer the latter, because you can at least
  visually distinguish which collection is the parameters and what
 grouping is
  the return values.
  args then errors:
  func (c *CustomCaller) MethodCaller(
  rootName string, version int, methodName string) (
  rpcreflect.MethodCaller, error) {
  }
  My particular unhappiness is because the opening character has to be on
 the
  previous line so that we don't get the implicit semicolons.
  Fit what you can in ~even lines:
  func (c *CustomCaller) MethodCaller(rootName string, version int,
  methodName string) (rpcreflect.MethodCaller, error) {
  }
  This also the example for  wrap at 80 characters because you can't
 break
  methodName from string, you have to end the line with punctuation.
  close to 8, break out errors
  func (c *CustomCaller) MethodCaller(rootName string, version int,
 methodName
  string) (
  rpcreflect.MethodCaller, error) {
  }
  Note that my email client forces the wrapping on the first line.
  You're doing it wrong, functions with 3 params and 2 return types should
 be
  turned into some sort of Params struct instead. (I'd agree if it was 5
 but
  3 and 2 isn't really very many.)
 
 
  I think our policy is (3), and maybe that's the best of what I've
  encountered.It means that either everything is on one line, or
 everything is
  a single value per line, with clear delineation between args and errors.
 
  Thoughts?
  John
  =:-
 
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe at:
  https://lists.ubuntu.com/mailman/listinfo/juju-dev
 

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: What happened to pinned bootstrap

2014-04-18 Thread William Reade
I remain strongly +1 on bootstrapping with tools that exactly match the
client version -- the client code near-enough directly drives those tools
as we bootstrap, and if we force pinned versions at bootstrap time we're
free to change how bootstrap works. If we need to deal with variation in
the version of the tools we're actually running, life becomes much harder
to no benefit. So: 1.18.2 client should always use 1.18.2 tools in the
course of bootstrapping.

As for automatically upgrading: it's clearly apparent that there's a
compelling case for not *always* doing so. But the bulk of patch releases
*will* be server-side bug fixes, and it's not great if we generally fail to
deliver those to casual users. I'm inclined to go with (1) a --version flag
for bootstrap, accepting only major.minor = client tools and (2) a
--dry-run flag for upgrade-juju (with the caveat that you'd need to pass
its result in as --version to the next command, because new tools *could*
be published in the gap between the dry-run and the, uh, wet one). Aaron,
even though this doesn't promote your use case to the default, I think
this'll address your issues; confirm?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Implementing Juju Actions

2014-03-27 Thread William Reade
We should not be using juju run to implement juju actions; they should be
fed to the uniter analogously to hooks.

On Thu, Mar 27, 2014 at 1:32 PM, Gustavo Niemeyer gust...@niemeyer.netwrote:

 On Thu, Mar 27, 2014 at 6:53 AM, Tim Penhey tim.pen...@canonical.com
 wrote:
  The outline that Gustavo has given has actions fit into this world view.

 Indeed. Because that's the way that sounds reasonable, but I'm open to
 arguments about why that's not the case.


I am entirely on board with your interpretation of this issue. Juju run
works that way out of expediency, not conceptual purity or correctness. The
ideal implementation of juju actions would probably end up replacing the
client-side implementation of juju-run, in favour of storing run commands
as (pseudo-)actions themselves.


 It's the same problem of internal service coordination. A leader must
 be elected to take responsibility for procedures on behalf of the
 service.


Agreed; actions absolutely can and should target services, but it's
possible to make progress on unit actions without worrying about that too
much. I would expect us to run service actions on the leader unit, probably
in a different execution context that could, if necessary, made additional
tools available for managing cross-unit coordination. If you're at risk of
becoming blocked on in-juju leader election, let us know and we will try to
expedite it.

James, I think Gustavo's suggestion that you take a look at the Config is
good, because it would be sensible to use the same pathways for configuring
actions as we already do for charms. By following the path of that data
from `deploy` and `set`, over the API, into state, and back out over the
API to the unit agents, you should get a pretty good idea of one half of
the task; the other half, of tweaking the Uniter state machine to respond
to these new request kinds in a sane way, will probably demand a separate
discussion.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Where should loggo live (and errgo, ...)?

2014-03-06 Thread William Reade
On Thu, Mar 6, 2014 at 4:56 AM, Ian Booth ian.bo...@canonical.com wrote:

 When I wrote the original email, I was thinking of something like
 juju-team as
 well, for the generic library projects (pending bikeshedding of
 juju-team).
 github.com/juju/... would still be used for all the Juju specific
 projects.


I'm not sure we get much benefit from making this distinction. It's true
that someone just reading the juju source will not necessarily be aware
that loggo/errgo are independently viable; but whenever anyone sees them
actually used in an external project, it will be clear that they *are*
independent. Popularising those libraries in the first place is a separate
problem, and I'm not sure the precise github url is going to be a major
blocker there; and if it *is* -- if we hear people saying I'd use loggo if
it weren't in the juju namespace -- it'd be pretty easy to fix that, I
think.

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


manual bootstrap in 1.18

2014-03-03 Thread William Reade
Hi all

We've been talking about what we need to do to finalise 1.18, and the issue
with manual bootstrap addresses in environments.yaml [0] has been causing
us some problems; in particular, we think it was a mistake to put the
bootstrap node address in environments.yaml in the first place, and that
what we should have done was to allow for `juju bootstrap --name foo --to
ssh:blah.blah` to create a .jenv without any reference to environments.yaml
at all.

However, blocking 1.18 on this change would be most unhelpful to many
people, so we would like to simply switch off manual provider bootstrap for
the 1.18 release; this would *not* be a regression, because we didn't have
manual bootstrap in 1.16.

If this would make you seriously unhappy, please speak now so we can try to
figure out a path that minimises global sadness.

Cheers
William



[0] https://bugs.launchpad.net/juju-core/+bug/1282642
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Juju Docs - SSH key requirements

2014-01-31 Thread William Reade
I absolutely agree that we should drop steps that are not necessary from
the documentation (so long as we *do* still describe authorized-keys-[path]
for those with more sophisticated requirements). But we can't do this while
1.16 is the latest stable version, lest we confuse all those users; and
even when 1.18 comes out, we can expect that some people will still be on
1.16 and want to be able to see relevant documentation.

On a related note, the various config-* pages still variously reference
default-series/admin-secret/control-bucket [0], and all the generate-config
output is out of date. This is basically the same poor experience we're
about to hit with the ssh-keygen stuff, and AFAICT it's currently
inescapable because (1) the docs are not separated by juju-core version, so
the developers literally *cannot* document upcoming features in a sane way
(2) the docs are not in the source tree; so, out of sight, out of mind and
(3) the docs are all raw HTML, so nobody's ever going to want to edit them
*anyway* [1].

It is critically important that we convert our docs so that they are
segregated by version, accessible to developers, and editable in *some*
sanely human-writable format [2]. Sorry I didn't push harder on this at
burlingame; I thought we could maybe continue as we were, but we really
can't. Let's fix it.

Cheers
William


[0] as does http://maas.ubuntu.com/docs/juju-quick-start.html, but I'm not
sure if that's yours.
[1] eg
http://bazaar.launchpad.net/~charmers/juju-core/docs/view/head:/htmldocs/authors-charm-best-practice.html--
less than a third of that is actual content, and the potential for
breakage on edit is *way* too high. And going forward, the prospect of
fixing some minor structural issue across N versions of the docs just
depresses me.
[2] I don't *care* which. Markdown? ReST? Something better I don't know? So
long as it's less work that HTML, please just mandate one and be done with
it.


On Fri, Jan 31, 2014 at 6:58 AM, Andrew Wilkins 
andrew.wilk...@canonical.com wrote:

 Hi Nick,

 https://juju.ubuntu.com/docs/getting-started.html

 On the Intro/Getting Started page for Juju, we say that you *need* to
 generate an SSH key pair. This is no longer true in 1.17.x: Juju will
 generate one for you. Juju will continue to upload the default public keys
 from ~/.ssh, but they are no longer absolutely required.

 I'm not sure if we should reword the docs or not, but thought I should at
 least bring this to your attention.  CC'ing the dev list in case someone
 has an opinion.

 Cheers,
 Andrew

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: juju system ssh keys - revisiting

2013-12-17 Thread William Reade
On Tue, Dec 17, 2013 at 7:20 AM, John Arbash Meinel
j...@arbash-meinel.comwrote:

  I believe the rationale was so that juju-run can target machines
  as well as units. To target a machine without any units deployed
  would mean hooks are out of the question.
 

 Then just run a hook context runner in the Machine agent. Still *much*
 better than actually needing to SSH into every machine and violating
 the model of every-other-way we run stuff on machines in the environment.


It's not quite that simple: a hook context demands a bunch of unit-specific
information. Faking up that information for a machine context with the
same tools available is not likely to be very helpful; the only point of
similarity, I think, is that machine commands usually want to be serialized
with unit commands and hook executions.

To speak to your larger point: we have a *very* specific mechanism for
running hooks on remote units, and most of that mechanism is simply
inapplicable here; the choice of what hooks to run is determined by the
individual units, after interpreting remote state in the context of local
state, and the API server certainly knows nothing about hooks. So, ofc, it
is perfectly reasonable to suggest that this sort of functionality be
mediated by the API (even if it doesn't entirely replace the need for live
SSH access), but it's not as simple as paraphrasing we have the
functionality, let's use it.

To speak to tim's general points -- we cannot indeed assume that users have
ssh keys, or even that they are entirely au fait with what ssh keys are and
do. Allowing those users to negotiate a bootstrap without confronting that
topic is clearly a Good Thing, and we must be prepared to allow *new* users
to be able to bootstrap environments from a GUI alone; and ofc we can
support these use cases with the key-updating stuff we just wrote. So for
those reasons, yes, we should generate environment identities.

But the juju-run perspective STM to favour jam's views. Live SSH access for
individual users must surely be done via the user's own SSH key -- it
doesn't seem like an API connection is really very well suited to this use
case -- but the user who wants to `apt-get upgrade` across a few thousand
machines is probably not well served by an architecture that demands that
*anything* make a separate SSH connection to each machine.

So, I endorse the general idea of an environment identity, but I don't
think it's the right mechanism for bulk juju-run commands. There's clearly
open space for extracting the  serialized execution context code (that can
be reused by unit and machine agents [0]); there's surely an opportunity
for us to write new code that delivers those commands to either unit or
machine agents; and there may even be some possibility of integrating that
delivery code with the hook queuing mechanism to make that code more
generally useful [1].

WRT juju-run in particular, I think it's more important to figure out the
right API-based delivery mechanism than to add environment identities;
those *are* important too, but I don't think they're really part of this
work.

Cheers
William



[0] There's *also* definitely a use case for executing code directly on a
machine *outside* a serialized context, and we mustn't cut this off. This
should not be the default, though.

[1] at the moment we don't have any ability to set priority across queues
for different relations.



 John
 =:-
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.13 (Cygwin)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iEYEARECAAYFAlKv7T0ACgkQJdeBCYSNAAPsLwCfVDV/adPkzKk8RfIhEJgCZEEl
 gU4An3TxGlipHH2FhDFlH8YSlfPtdOrN
 =EDgS
 -END PGP SIGNATURE-

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Do we care about 1.16 compatibility for 'juju add-machine ssh:user@host ?

2013-12-08 Thread William Reade
On Sun, Dec 8, 2013 at 2:35 PM, John Arbash Meinel
j...@arbash-meinel.comwrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 So I'm trying to finish the work to maintain compatibility between
 trunk and 1.16 for all commands.

 I think I've finished everything except for manual provisioning.
 However, this rabbit hole keeps getting deeper, so I wanted to ask how
 much we really care.

 The issues I've run into so far:
   lp:~jameinel/juju-core/add-machine-manual-compat-1253631

 1) Code factoring. We have a call to 'recordMachineInState' which
 actually does a bit of work to figure out what state the machine is in
 (series, arch, etc) and then directly calls the API. I have one change
 which I think makes it much nicer, which is to have 1 call to gather
 the information, and then a *different* function to actually record
 it. (That way the gather step can be reused when we have to fall back
 to direct DB manipulation).


+1, that sounds like the right factoring to me.

However, it actually doesn't really do what we want anyway. The code
 in question detects that we had an error after allocating a machineId,
 and then tries to clean up by calling DestroyMachine. It is fairly
 likely, though, that the machine agent will never actually come up.


I don't follow. If there's no machine id, there's no machine, and no need
to destroy it.

It seems like a lot of busy work to end up with a machine that is in a
 bad state.


So, do we (1) figure out what to run (series, arch) and whether it's
possible; (2) add that machine to state; (3) start the agent on the machine?

If (2) fails, we shouldn't have changed the machine; and if (3) fails I
agree we need a fast path to just nuke the machine in state, but it's
surely not going to leave the actual system in a confused state, is it?

4) Client.MachineConfig didn't exist in 1.16. This is probably the
 biggest deal. The API actually does a lot of work. It grabs the API
 and State addresses, looks up tools for the machine (in the provider),
 and sets up a new Machine ID + Password combination for the agent.

 The big thing is having to reproduce all that chunk of code seems like
 a PITA and searching for tools from the client is annoying to do again.


I'm willing to pay an additional tools lookup to keep manual provisioning
in sync with everything else; and ISTM that this is otherwise just a matter
of moving that code temporarily to a shared location (like statecmd), and
then putting it back into the api server directly when we don't need direct
db access any more. Am I missing something?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: New juju-mongodb package

2013-11-28 Thread William Reade
The mongodump and mongorestore binaries would also be extremely useful.

Cheers
William


On Thu, Nov 28, 2013 at 3:38 PM, James Page james.p...@canonical.comwrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 On 28/11/13 14:18, Gustavo Niemeyer wrote:
  It would be good to have the mongo binary available and working
  as well, also under that juju-specific namespace. This is the
  console client, and will be useful to connect to the local juju
  database when debugging issues.

 Makes sense - added.

 - --
 James Page
 Technical Lead
 Ubuntu Server Team
 james.p...@canonical.com
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.15 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBCAAGBQJSl1VnAAoJEL/srsug59jDDeoQAL9mi/qX8QlC9sx0i3MTjT4W
 N8VlvReHg3bgYlig5qZVu6ShMiZ3vd6mY9qoFB+FdVEmMEazHr5+am/JgmLAF+6G
 nOEAwsiZSBPO/pMOypXPdp0uvSlL8dmCN3aXzQ8X7R0hWZhm1dHhk6DHaY+VImIr
 pjs31OaRpCgxqgAjwYcw76apVb0gnKNrDwYA45lPDPMlufFUe7l1xlGJJJFPj8Ap
 9QC8Jzy/LeYagb3D4SeMEAEnpSg2nuoQr1KXVbb87RPPUy6XAFWfyTNyaLmMgypG
 Ci254fcC6U22gFLyVj4A5oXSFR2a3tLEEkubN4qJHMVXRFJm2OxZZrTSnNpnUiOW
 9vSQmteLUxMMKa6RugZRNJFPWZBu4Sg6O/uV5/y9Lv675sVfJy+j12N1RxAHXmae
 MZgsKj7F7LIOghB18bJvJh6r2FUCZc5l7pywLUysCtEWw1lCBTkCHe1QqCpRR/Fu
 r2w85OAM4IOdrtyMa1mpt0G78jJgq7wvyFifI+MXo/WvTmE8G7EnKDvreN76z0FI
 jagTKvSgqqRQ91tNjiUkFuKKbji/KBjOjmXf88ho3pmMfYRSm/LZhytBNgfxMiKs
 XVWc9t+MFOcs0GINdh9pYJQf+9SE7XpTkLkLcQ9QpgDVFmZpu8trkec40WOxJ4nq
 AcybyPpBCHO5wjyQ+pvi
 =++qm
 -END PGP SIGNATURE-

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: relations system in juju

2013-11-22 Thread William Reade
Hi Vasiliy

There's some moderately high-level documentation of what happens in
doc/charms-in-action.txt in the juju-core source tree, but that doesn't
really get into the details. If you're interested in diving through the
code, the most fundamental piece is the state.RelationUnit type
(state/relationunit.go): its EnterScope, LeaveScope, and Settings methods
are responsible for the inputs into the system, and the Watch method
(implemented in state/watcher.go) is responsible for pulling all those
changes into a stream of events for consumption by the units. The
worker/uniter package -- particularly relationer.go, and the relation
subpackage -- are the clients responsible for turning that stream of events
into actual hook executions.

At heart, it all depends on Gustavo Niemeyer's mgo driver for mongodb on
golang; we manage db changes with the mgo/txn package, and (in juju-core
again) the state/watcher package turns the transaction log into events
consumed by the RelationUnitsWatcher.

HTH
William


On Fri, Nov 22, 2013 at 7:20 AM, Vasiliy Tolstov v.tols...@selfip.ruwrote:

 Hi all. I want to create simple monitoring in golang =). I'm very
 interesting in relations system in juju. How can it works and what
 files contains code for deal with it. Is there exists some
 docs/articles about this system?

 --
 Vasiliy Tolstov,
 e-mail: v.tols...@selfip.ru
 jabber: v...@selfip.ru

 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: openstack hard depency or not

2013-11-19 Thread William Reade
Hi Vassily

I'm interested to know a little bit more about your use case. It sounds
like you're interested in managing client user identities and permissions
within a single environment, so as to avoid the costs of running one
environment for each of your clients -- is that accurate? Can you tell me a
little bit more about what you're planning to do -- for example, are you
considering allowing clients to manage their services via juju directly?

The on-demand state server is potentially tricky to arrange -- the juju
model involves a state server that's always running, with a set of agents
connecting to that server to monitor the desired state of the system and
respond to changes. It *is* designed to have the configured services
resilient in the face of management failure -- so that losing juju does not
mean you lose your services -- but we wouldn't encourage deliberately
shutting down management to save on state servers.

With respect to the next two cycles, we mean roughly less than a year.

Cheers
William



On Mon, Nov 18, 2013 at 8:04 AM, Vasiliy Tolstov v.tols...@selfip.ruwrote:

 2013/11/18 John Arbash Meinel j...@arbash-meinel.com:
  Given that you can run multiple services on a given machine, is there
  a strong reason that multiple state servers are a problem for you? (I
  want to make sure if there is a use case that is a problem our
  solution actually covers it.)


 As i see , clients that want wordpress or something else does not want
 to create unneeded things.
 After they installs wordpress (for example) they can test it and works
 with it. After sometime if all goes ok thet want to add load balancer
 and something new to it.
 But i don't think that all time they need state server. I think best
 of all spawn state server on demand. For example user interact with
 juju with provided keys, my own server (or such thinkg) check incoming
 key and spawn lxc container with state server and proxy to it (or this
 can be do it in state server itself).


 --
 Vasiliy Tolstov,
 e-mail: v.tols...@selfip.ru
 jabber: v...@selfip.ru

 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


  1   2   >