from:"Menno Smits"

FullStatus API when machine or unit is StatusDown

2014-05-18 Thread Menno Smits

I'm working on a change to make "juju status" report the remote unit when a
relation hook fails (https://bugs.launchpad.net/juju-core/+bug/1194481).

During this work I've noticed that the FullStatus client API does some
munging of the agent state info string if the agent appears to be dead but
shouldn't be. It replaces the agent state info string with "(status: status
info)" (note the parens) and then changes the returned status to
StatusDown. You've probably noticed juju status strings with parens around
them before - this is why.

This code is at state/apiserver/client/status.go:540.

This is problematic for two reasons:

   1. the client API code is making formatting decisions that should be
   left up to the client. In the case of the ticket I'm working on, the juju
   status command handler ends up having to detect and unpick the parens in
   status string while generating the status output so that it can correctly
   append the remote unit name. This is easy enough but ugly and feels wrong.
   2. because the munging is done within the FullStatus API call, clients
   that are consuming events via a watcher will not see the munged status and
   info. If a client was to use both a watcher and FullStatus, it could see
   different status values for a unit or machine between the two APIs.

What I'd like to do is not have the client API munge the status info string
and instead use the StatusData map to report the record the original
status. When a unit or machine status is StatusDown a client can look in
StatusData for the pre-agent-down status if it wants it.

It would also be good to move the "if agent down set the status to
StatusDown" logic to somewhere deeper so that all clients get a consistent
view of the status. I can live with it if this isn't feasible however -
this part isn't necessary to get the ticket done.

I'm concerned that this change may impact the GUI. As far as I can tell the
GUI doesn't call FullStatus so any changes to that API shouldn't affect it
(please correct me if I'm wrong). Changing how StatusDown is set however
will probably impact the GUI.

Thoughts?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Schema migration process

2014-05-20 Thread Menno Smits

On 19 May 2014 16:35, Tim Penhey  wrote:

> On 16/05/14 17:29, John Meinel wrote:
> > So I'm pretty sure the ability to do schema upgrades was scoped in the
> > HA work that Nate is working on. I believe the idea is roughly:
>
> OK, I see the card on the board on Nate's team for this iteration.
>
> Is it likely to get done?  Since this is something that we (onyx) need
> for the identity work, and from the description given it looks like it
> could be broken up into several different branches, is it something that
> we could help with to get there sooner?
>

Any idea when this will get done Nate? Can we (Onyx) help with any part of
it?
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Schema migration process

2014-05-29 Thread Menno Smits

Team Onyx is accumulating tasks that can't be completed until the schema
migrations infrastructure is in place so I've started digging in to the
details to figure out what exactly needs to be done. I'm aiming to come up
with an approach and some the work items for the initial incarnation of
schema migrations.

*Overall Approach*

Building on John's thoughts, and adding Tim's and mine, here's what I've
got so far::

- Introduce a "database-version" key into the EnvironConfig document which
tracks the Juju version that the database schema matches. More on this
later.

- Introduce a MasterStateServer upgrade target which marks upgrade steps
which are only to run on the master state server. Also more below.

During an upgrade:

- State machine agents get upgraded first - other agents don't see the new
agent version until the state machine agents are running the new version (I
have double-checked that this is already in place)

- When state machines are restarted following an upgrade, don't allow other
agents to Login until the upgrade is finished. Allow Client connections
however (for monitoring during upgrades as discussed by John).

- Non-master JobManageEnviron machine agents run their upgrade steps as
usual and then watch for EnvironConfig changes. They don't consider the
upgrade to be complete (and therefore let their other workers start) until
database-version matches agent-version. This prevents the new version of
the state server agents from running before the schema migrations for the
new software version have run.

- The master machine agent waits for the other JobManageEnviron agents to
have upgraded to the new version and then runs the upgrade steps including
those for the MasterStateServer upgrade target (these steps will be the
schema upgrades). The wait ensures that the database schema isn't upgraded
before a state server is ready for the changes (this could happen if a
non-master state server is slow or the master is especially fast).

- Once the master machine agent has completed its upgrade steps it updates
database-version in EnvironConfig to the new Juju version. This signals to
the other state machine agents that they can complete their upgrades,
allowing their workers to start.

- At this point, all state servers have completed their upgrades and allow
agent logins.

- Now that agents can connect to the upgraded state servers, they see the
new agent version and upgrade themselves (this functionality already exists)


*Observations/Questions/Issues*

- There are a lot of moving parts here. What could be made simpler?

- What do we do if the master mongo database or host fails during the
upgrade? Is it a goal for one of the other state servers take over and run
the schema upgrades itself and let the upgrade finish? If so, is this a
must-have up-front requirement or a nice-to-have?

- Upgrade steps currently have access to State but I think this probably
won't be sufficient to perform many types of schema migrations (i.e.
accessing defunct fields, removing fields, adding indexes etc). Do we want
to extend State to provide a number of schema migration helpers or do we
expose mongo connections directly to the upgrade steps?

- There is a possibility that a non-master state server won't upgrade,
blocking the master from completing the upgrade. Should there be a timeout
before the master gives up on state servers upgrading themselves and
performs its own upgrade steps anyway?

- Given the order of documents a juju system stores, it's likely that the
schema migration steps will be quite quick, even for a large installation.


I'm new to all this so please chime in with your suggestions and
corrections. Team Onyx (well, me at least) is likely to start on this next
week.

- Menno

p.s. I'm out until Tuesday so it's unlikely I'll see (or at least respond
to) any replies until then.







On 16 May 2014 17:29, John Meinel  wrote:

> So I'm pretty sure the ability to do schema upgrades was scoped in the HA
> work that Nate is working on. I believe the idea is roughly:
>
> 1) State machines get restarted first (there is code today to not let
> other agents notice that there is an upgrade ready until the state machine
> itself has been upgraded)
>
> 2) When the state machine is restarted, don't allow other agents to Login
> until the API Worker has connected to the API Server. This should mean that
> we are in a relatively quiet mode.
>
> 2a) For reasons I'll describe later, still allow Client connections.
>
> 3) When the APIWorker comes up and sees that there is an Upgrade that just
> happened and Upgrade actions are pending, wait to start the upgrade steps
> until we observe that all other machines with JobManageEnviron have been
> upgraded to the new version. Continue blocking Login requests.
>
> 4) Once all JobManageEnviron machine agent docs report the correct Agent
> Version, then the Singleton Upgrader (so the one running on the machine
> with the master Mongo DB), applies the DB Schema Upgrade

Re: Schema migration process

2014-06-02 Thread Menno Smits

On 30 May 2014 01:47, John Meinel  wrote:

>
>
>> Building on John's thoughts, and adding Tim's and mine, here's what I've
>> got so far::
>>
>> - Introduce a "database-version" key into the EnvironConfig document
>> which tracks the Juju version that the database schema matches. More on
>> this later.
>>
>
> For clarity, I would probably avoid putting this key into EnvironConfig,
> but instead have it in a separate document. That also makes it easy to
> watch for just this value changing.
>

SGTM. I've got no strong opinion on this.


>
> Potentially, I would decouple the value in this key from the actual agent
> versions. Otherwise you do null DB schema upgrades on every minor release.
> Maybe that's sane, but it *feels* like they are too separate issues. (what
> is the version of the DB schema is orthogonal to what version of the code
> I'm running.) It may be that the clarity and simplification of just one
> version wins out.
>

I think it makes sense to just use the Juju version for the DB schema
version. When you think about it, the DB schema is actually quite tightly
coupled to the code version so why introduce another set of numbers to
track? I'm thinking that if there's no schema upgrade steps required for a
software given version then the DB is left alone except that the schema
version number gets bumped.


> - Introduce a MasterStateServer upgrade target which marks upgrade steps
>> which are only to run on the master state server. Also more below.
>>
>
> This is just a compiled-in list of steps to apply, right?
>

Yes. I was thinking that schema upgrade steps would be defined in the same
place and way that other upgrade steps are currently defined so that they
could even be interleaved with other kinds of upgrade steps.

What I'm proposing here is that where we currently have 2 types of upgrade
targets - AllMachines and StateServer - we introduce a third target called
MasterStateServer which would be primarily (exclusively?) used for schema
migration steps.


>> - Non-master JobManageEnviron machine agents run their upgrade steps as
>> usual and then watch for EnvironConfig changes. They don't consider the
>> upgrade to be complete (and therefore let their other workers start) until
>> database-version matches agent-version. This prevents the new version of
>> the state server agents from running before the schema migrations for the
>> new software version have run.
>>
>
> I'm not sure if schema should be done before or after other upgrade steps.
> Given we're really stopping the world here, it might be prudent to just
> wait to do your upgrade steps until you know that the DB upgrade has been
> done.
>

As mentioned above, with what I'm thinking there is no real distinction
between schema migration steps and other types of upgrade steps so there's
no concept of schema migrations happening before or after other upgrade
steps.

  *Observations/Questions/Issues*
>
>>
>> - There are a lot of moving parts here. What could be made simpler?
>>
>> - What do we do if the master mongo database or host fails during the
>> upgrade? Is it a goal for one of the other state servers take over and run
>> the schema upgrades itself and let the upgrade finish? If so, is this a
>> must-have up-front requirement or a nice-to-have?
>>
>
> Some thoughts:
>


> 1. If the actual master mongo DB fails, that will cause reelection, which
> should cause all of the servers to get their connections to Mongo bounced,
> and then they'll notice that there is a new master who is responsible for
> applying the database changes.
>

 We will have to do some testing to ensure that this scenario actually
works. Maybe I'm over thinking it, but my gut says there's there's plenty
to go wrong here.

2. If it is just the master Juju process that fails, I don't think there is
> any great expectation that a different process running the same code is
> going to succeed, is there?
>

Agreed.


> 3. There is also a fair possibility that the schema migration we've
> written won't work with real data in the wild. (we assumed this field was
> never written, but suddenly it is, etc). We've talked about the ability to
> have Upgrade roll back, and maybe we could consider that here. Some
> possible steps are:
>
>1. Copy the db to another location
>2. Try to apply the schema updates (either in place or only to the
>backup)
>3. If upgrade fails, roll back to the old version, and update the
>AgentVersion in environ config so that the other agents will try to
>"upgrade" themselves back to the old version. This would also be a reason
>to do the DB schema before actually applying any other upgrade steps. We
>probably want some sort of "could not upgrade because of" tracking here, so
>that it can be reported to the user
>
>
I like this and it should work as long as there's enough storage available
to make a copy of the database. I'm not exactly clear on how we would
revert to the backup instance if the migration fails b

Re: Juju is on github

2014-06-04 Thread Menno Smits

On 5 June 2014 06:58, Jorge Niedbalski 
wrote:

>
> 2cents: Why not rename the CONTRIBUTING file to CONTRIBUTING.md
> so it will be rendered as a markdown file by github?
>

That seems sensible. Any objections?
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Reviewing in progress work on Github

2014-06-04 Thread Menno Smits

You can keep updating a pull request as your work on a feature. Old diffs
and comment are preserved. On nice feature is that old comments are hidden
if the area of the code they reference is changed by a later diff.

This link from the Github people themselves is light on detail but talks
about how they use a pull request as a way of tracking work in progress
right from the beginning of the development of a feature.

https://github.com/blog/1124-how-we-use-pull-requests-to-build-github

I guess this doesn't address the email spam problem but it does seem like
it would work otherwise. People can individually opt out of notifications
about a specific PR but that's not really ideal.




On 5 June 2014 15:41, Ian Booth  wrote:

> You can, but then you loose any previous comments and history on the
> previously
> work in progress pull request. With Launchpad, you could just flip the
> status on
> the merge proposal and all the comments would be retained.
>
> On 05/06/14 13:38, John Meinel wrote:
> > Can't you just create a new Pull request when it is finally ready?
> >
> > John
> > =:->
> >
> >
> >
> > On Thu, Jun 5, 2014 at 7:25 AM, Ian Booth 
> wrote:
> >
> >> One of the many things I miss now that we have moved to Github/git is
> the
> >> ability to put up a merge proposal with in-progress work, allowing
> >> collaboration
> >> on the implementation as it evolves etc. Launchpad supported this
> nicely,
> >> as it
> >> didn't spam people with emails for wip mps etc.
> >>
> >> I don't think Github supports this concept. Perhaps a way around it is
> to
> >> do a
> >> pull request against one's forked copy of the main Juju repo. That way,
> >> people
> >> can still easily see your work, but without the general spam. Sadly, it
> >> doesn't
> >> allow the pull request to then be re-targetted to the Juju repo when
> ready
> >> to be
> >> reviewed for real. Or does it?
> >>
> >> Does anyone have any better ideas how to do this?
> >>
> >> --
> >> Juju-dev mailing list
> >> Juju-dev@lists.ubuntu.com
> >> Modify settings or unsubscribe at:
> >> https://lists.ubuntu.com/mailman/listinfo/juju-dev
> >>
> >
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Schema migration process

2014-06-05 Thread Menno Smits

After some fruitful discussions, Tim and I have come up with something that
I think is starting to look pretty good. There's a significant change to
how we handle backups and rollbacks that seems like the right direction.
I've tried to capture it all in a Google Doc as this email thread is
starting to get impractical. Feel free to add comments and edit.

https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing


On 3 June 2014 13:34, Menno Smits  wrote:

> On 30 May 2014 01:47, John Meinel  wrote:
>
>>
>>
>>> Building on John's thoughts, and adding Tim's and mine, here's what I've
>>> got so far::
>>>
>>> - Introduce a "database-version" key into the EnvironConfig document
>>> which tracks the Juju version that the database schema matches. More on
>>> this later.
>>>
>>
>> For clarity, I would probably avoid putting this key into EnvironConfig,
>> but instead have it in a separate document. That also makes it easy to
>> watch for just this value changing.
>>
>
> SGTM. I've got no strong opinion on this.
>
>
>>
>> Potentially, I would decouple the value in this key from the actual agent
>> versions. Otherwise you do null DB schema upgrades on every minor release.
>> Maybe that's sane, but it *feels* like they are too separate issues. (what
>> is the version of the DB schema is orthogonal to what version of the code
>> I'm running.) It may be that the clarity and simplification of just one
>> version wins out.
>>
>
> I think it makes sense to just use the Juju version for the DB schema
> version. When you think about it, the DB schema is actually quite tightly
> coupled to the code version so why introduce another set of numbers to
> track? I'm thinking that if there's no schema upgrade steps required for a
> software given version then the DB is left alone except that the schema
> version number gets bumped.
>
>
>> - Introduce a MasterStateServer upgrade target which marks upgrade steps
>>> which are only to run on the master state server. Also more below.
>>>
>>
>> This is just a compiled-in list of steps to apply, right?
>>
>
> Yes. I was thinking that schema upgrade steps would be defined in the same
> place and way that other upgrade steps are currently defined so that they
> could even be interleaved with other kinds of upgrade steps.
>
> What I'm proposing here is that where we currently have 2 types of upgrade
> targets - AllMachines and StateServer - we introduce a third target called
> MasterStateServer which would be primarily (exclusively?) used for schema
> migration steps.
>
>
>>> - Non-master JobManageEnviron machine agents run their upgrade steps as
>>> usual and then watch for EnvironConfig changes. They don't consider the
>>> upgrade to be complete (and therefore let their other workers start) until
>>> database-version matches agent-version. This prevents the new version of
>>> the state server agents from running before the schema migrations for the
>>> new software version have run.
>>>
>>
>> I'm not sure if schema should be done before or after other upgrade
>> steps. Given we're really stopping the world here, it might be prudent to
>> just wait to do your upgrade steps until you know that the DB upgrade has
>> been done.
>>
>
> As mentioned above, with what I'm thinking there is no real distinction
> between schema migration steps and other types of upgrade steps so there's
> no concept of schema migrations happening before or after other upgrade
> steps.
>
>   *Observations/Questions/Issues*
>>
>>>
>>> - There are a lot of moving parts here. What could be made simpler?
>>>
>>> - What do we do if the master mongo database or host fails during the
>>> upgrade? Is it a goal for one of the other state servers take over and run
>>> the schema upgrades itself and let the upgrade finish? If so, is this a
>>> must-have up-front requirement or a nice-to-have?
>>>
>>
>> Some thoughts:
>>
>
>
>> 1. If the actual master mongo DB fails, that will cause reelection, which
>> should cause all of the servers to get their connections to Mongo bounced,
>> and then they'll notice that there is a new master who is responsible for
>> applying the database changes.
>>
>
>  We will have to do some testing to ensure that this scenario actually
> works. Maybe I'm over thinking it, but my gut says there's there's p

Side-by-side diffs on Github

2014-06-05 Thread Menno Smits

For those of you missing side-by-side diffs on Github, have a look at the
Octosplit Chrome extension. I've just started using it and it looks pretty
good.

https://chrome.google.com/webstore/detail/octosplit/mnkacicafjlllhcedhhphhpapmdgjfbb
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Reviewing in progress work on Github

2014-06-05 Thread Menno Smits

If we're finding too many shortcomings with the GH code review system we
could look at Gerrit or something else.

It looks like people have already done Gerrit/Github integration before:
http://gerrithub.io/
http://www.packtpub.com/article/using-gerrit-with-github



On 6 June 2014 12:06, Andrew Wilkins  wrote:

> On Thu, Jun 5, 2014 at 9:26 PM, roger peppe  wrote:
>
>> On 5 June 2014 14:22, Andrew Wilkins 
>> wrote:
>> > On Thu, Jun 5, 2014 at 9:02 PM, roger peppe  wrote:
>> >>
>> >> On 5 June 2014 05:07, Menno Smits  wrote:
>> >> > One nice feature is that old comments are hidden
>> >> > if the area of the code they reference is changed by a later diff.
>> >>
>> >> I consider this a serious shortcoming. I generally want to see all the
>> >> conversation for a given code review. Is it possible to reveal
>> >> the old comments somehow?
>> >
>> >
>> > See the comment history on: https://github.com/juju/juju/pull/8
>> >
>> > - I proposed, and jam made some comments
>> > - I addressed the comments in a new commit, rebased and squashed it
>> into the
>> > old, and did a "git push -f"
>> >
>> > There's now a "jameinel commented on an outdated diff". Click on "Show
>> > outdated diff" to the right, and you can see the old comments.
>> >
>> > (I was under the impression that old comments got dropped altogether
>> after
>> > rebase. Not sure if behaviour in GitHub changed at some point, or if it
>> was
>> > always like this...)
>>
>> Ah, that's better than I thought, thanks.
>>
>> Is there a straightforward way of seeing the changes made in response
>> to a particular I(outdated) comment? That's something I did almost every
>> time in Rietveldt.
>>
>
> Not AFAIK. I miss this too.
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Side-by-side diffs on Github

2014-06-06 Thread Menno Smits

Yeah, it's definitely not perfect. It would be much better if Github had
native support for side-by-side diffs.


On 6 June 2014 22:10, Nate Finch  wrote:

> Yeah, note that it bugs out if you try to expand the context around your
> changes *after* turning on side-by-side mode, so expand what you want first
> and *then* turn on side-by-side mode.
>
> Also I wish it were really side by side mode,  It's still basically
> over/under, just horizontally separated with duplicated context around
> each.  It's *better* but I still miss a real line by line diff that can
> show changes in the middle of the lines, not just old line / new line.
>
>
> On Thu, Jun 5, 2014 at 9:45 PM, Menno Smits 
> wrote:
>
>> For those of you missing side-by-side diffs on Github, have a look at the
>> Octosplit Chrome extension. I've just started using it and it looks pretty
>> good.
>>
>>
>> https://chrome.google.com/webstore/detail/octosplit/mnkacicafjlllhcedhhphhpapmdgjfbb
>>
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Git's push.default setting

2014-06-06 Thread Menno Smits

To avoid having to type the branch name when pushing changes to your fork
on Github I would recommend setting push.default to "current". You can then
just push with "git push origin" (no branch name required).

Git v1's default push behaviour if you don't specify the branch name is to
push all branches where the local branch name matches the remote branch
name. This might be more than you really want to push. Using a push.default
of "current" gives both convenience and safety.

There's a good explanation here:  http://stackoverflow.com/a/948397
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Landing bot merge commit messages need the PR title

2014-06-08 Thread Menno Smits

I've noticed that the commit messages for the merges performed by the Juju
bot don't include the PR title. This can lead to some slightly odd commit
logs.

For example, have a look at PR #13. The merge commit (d8a2c54) message
starts like this:

Merge pull request #13 from dimitern/501-use-networks-constraints

- in "juju deploy"
- update help topics
- change MachineConfig to only have Networks (to include)
...

The bullet list needs the context of the title. It would make more sense
like this:

Merge pull request #13 from dimitern/501-use-networks-constraints

*Start using networks constraints for deployments*

- in "juju deploy"
- update help topics
- change MachineConfig to only have Networks (to include)
...


Looking through the commit logs, most other merge commit messages would
have been clearer with the PR title. Is this easy to fix?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Landing bot merge commit messages need the PR title

2014-06-09 Thread Menno Smits

I'm happy to make the change and test it, but what's the process for
getting it merged and deployed?


On 9 June 2014 19:22, Andrew Wilkins  wrote:

> On Mon, Jun 9, 2014 at 1:39 PM, Menno Smits 
> wrote:
>
>> I've noticed that the commit messages for the merges performed by the
>> Juju bot don't include the PR title. This can lead to some slightly odd
>> commit logs.
>>
>> For example, have a look at PR #13. The merge commit (d8a2c54) message
>> starts like this:
>>
>> Merge pull request #13 from dimitern/501-use-networks-constraints
>>
>> - in "juju deploy"
>> - update help topics
>> - change MachineConfig to only have Networks (to include)
>> ...
>>
>> The bullet list needs the context of the title. It would make more sense
>> like this:
>>
>> Merge pull request #13 from dimitern/501-use-networks-constraints
>>
>> *Start using networks constraints for deployments*
>>
>> - in "juju deploy"
>> - update help topics
>> - change MachineConfig to only have Networks (to include)
>> ...
>>
>>
>> Looking through the commit logs, most other merge commit messages would
>> have been clearer with the PR title. Is this easy to fix?
>>
>
> Looks fairly trivial:
> https://github.com/juju/jenkins-github-lander/blob/develop/src/jenkinsgithublander/github.py#L134
>
> Just need to add in pr["title"].
>
> - Menno
>>
>>
>>
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Landing bot merge commit messages need the PR title

2014-06-09 Thread Menno Smits

On 10 June 2014 11:29, Richard Harding  wrote:

> It uses itself to land. Just submit a pull request and it'll get pulled
> into our CI http://ci.jujugui.org:8080/ and I can review/land it.
>
> To update it, you'll have to log into the jenkins server and git pull the
> updated codebase if it was deployed to the jenkins server as a git clone to
> start with. I'm not 100% sure on that part.
>
>
Thanks for the explanation. I've just created a PR for this change.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Landing bot merge commit messages need the PR title

2014-06-10 Thread Menno Smits

Thanks Martin.


On 11 June 2014 02:36, Martin Packman  wrote:

> On 10/06/2014, Menno Smits  wrote:
> >
> > Thanks for the explanation. I've just created a PR for this change.
>
> Have pulled in the landed change on the jenkins machine.
>
> Will also propose my various changes today as well.
>
> Martin
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Schema migration process

2014-06-11 Thread Menno Smits

I've updated the schema migration document with the ideas that have come up
in recent discussions.The scope of the schema migrations work has been
reduced somewhat by making the upgrade step Apply/Rollback concept a
separate project (database changes can be rolled back through the use of
mongobackup/restore).

I've raised a few issues in the comments about handling various failure
modes. Input would be greatly appreciated.

Nate: it would be good for you to have a look at this because we're
planning on leaning on the new backup functionality quite a bit. Let me
know if anything I'm proposing isn't compatible with what your team is
working on.

https://docs.google.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing

On 6 June 2014 13:18, Menno Smits  wrote:

> After some fruitful discussions, Tim and I have come up with something
> that I think is starting to look pretty good. There's a significant change
> to how we handle backups and rollbacks that seems like the right direction.
> I've tried to capture it all in a Google Doc as this email thread is
> starting to get impractical. Feel free to add comments and edit.
>
>
> https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>
>
> On 3 June 2014 13:34, Menno Smits  wrote:
>
>> On 30 May 2014 01:47, John Meinel  wrote:
>>
>>>
>>>
>>>> Building on John's thoughts, and adding Tim's and mine, here's what
>>>> I've got so far::
>>>>
>>>> - Introduce a "database-version" key into the EnvironConfig document
>>>> which tracks the Juju version that the database schema matches. More on
>>>> this later.
>>>>
>>>
>>> For clarity, I would probably avoid putting this key into EnvironConfig,
>>> but instead have it in a separate document. That also makes it easy to
>>> watch for just this value changing.
>>>
>>
>> SGTM. I've got no strong opinion on this.
>>
>>
>>>
>>> Potentially, I would decouple the value in this key from the actual
>>> agent versions. Otherwise you do null DB schema upgrades on every minor
>>> release. Maybe that's sane, but it *feels* like they are too separate
>>> issues. (what is the version of the DB schema is orthogonal to what version
>>> of the code I'm running.) It may be that the clarity and simplification of
>>> just one version wins out.
>>>
>>
>> I think it makes sense to just use the Juju version for the DB schema
>> version. When you think about it, the DB schema is actually quite tightly
>> coupled to the code version so why introduce another set of numbers to
>> track? I'm thinking that if there's no schema upgrade steps required for a
>> software given version then the DB is left alone except that the schema
>> version number gets bumped.
>>
>>
>>> - Introduce a MasterStateServer upgrade target which marks upgrade steps
>>>> which are only to run on the master state server. Also more below.
>>>>
>>>
>>> This is just a compiled-in list of steps to apply, right?
>>>
>>
>> Yes. I was thinking that schema upgrade steps would be defined in the
>> same place and way that other upgrade steps are currently defined so that
>> they could even be interleaved with other kinds of upgrade steps.
>>
>> What I'm proposing here is that where we currently have 2 types of
>> upgrade targets - AllMachines and StateServer - we introduce a third target
>> called MasterStateServer which would be primarily (exclusively?) used for
>> schema migration steps.
>>
>>
>>>> - Non-master JobManageEnviron machine agents run their upgrade steps as
>>>> usual and then watch for EnvironConfig changes. They don't consider the
>>>> upgrade to be complete (and therefore let their other workers start) until
>>>> database-version matches agent-version. This prevents the new version of
>>>> the state server agents from running before the schema migrations for the
>>>> new software version have run.
>>>>
>>>
>>> I'm not sure if schema should be done before or after other upgrade
>>> steps. Given we're really stopping the world here, it might be prudent to
>>> just wait to do your upgrade steps until you know that the DB upgrade has
>>> been done.
>>>
>>
>> As mentioned above, with what I'm thinking there is no real distinction
>> between schema migration steps and

Re: Schema migration process

2014-06-12 Thread Menno Smits

That's odd because it's the same document as before that people (including
yourself) commented on. I've just double-checked and it open for anyone to
view and Canonical people to view and edit. Were you perhaps not coming in
via your Canonical account?


On 12 June 2014 21:35, John Meinel  wrote:

> oh, and you shared the doc, but you didn't allow comments or editing, so
> we can't actually put comments on there.
> John
> =:->
>
>
> On Thu, Jun 12, 2014 at 1:33 PM, John Meinel 
> wrote:
>
>> If I read the conversations on IRC, they were talking about changing the
>> backup to be just a POST to an HTTP endpoint, and you get back the contents
>> of the DB, which would be deleted when the backup completes. Though you
>> could still probably use whatever internal helpers spool the data to a temp
>> location to do the same for backup & restore.
>>
>>
>> On Thu, Jun 12, 2014 at 8:40 AM, Menno Smits 
>> wrote:
>>
>>> I've updated the schema migration document with the ideas that have come
>>> up in recent discussions.The scope of the schema migrations work has been
>>> reduced somewhat by making the upgrade step Apply/Rollback concept a
>>> separate project (database changes can be rolled back through the use of
>>> mongobackup/restore).
>>>
>>> I've raised a few issues in the comments about handling various failure
>>> modes. Input would be greatly appreciated.
>>>
>>> Nate: it would be good for you to have a look at this because we're
>>> planning on leaning on the new backup functionality quite a bit. Let me
>>> know if anything I'm proposing isn't compatible with what your team is
>>> working on.
>>>
>>>
>>> https://docs.google.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>>>
>>> On 6 June 2014 13:18, Menno Smits  wrote:
>>>
>>>> After some fruitful discussions, Tim and I have come up with something
>>>> that I think is starting to look pretty good. There's a significant change
>>>> to how we handle backups and rollbacks that seems like the right direction.
>>>> I've tried to capture it all in a Google Doc as this email thread is
>>>> starting to get impractical. Feel free to add comments and edit.
>>>>
>>>>
>>>> https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>>>>
>>>>
>>>> On 3 June 2014 13:34, Menno Smits  wrote:
>>>>
>>>>> On 30 May 2014 01:47, John Meinel  wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>> Building on John's thoughts, and adding Tim's and mine, here's what
>>>>>>> I've got so far::
>>>>>>>
>>>>>>> - Introduce a "database-version" key into the EnvironConfig document
>>>>>>> which tracks the Juju version that the database schema matches. More on
>>>>>>> this later.
>>>>>>>
>>>>>>
>>>>>> For clarity, I would probably avoid putting this key into
>>>>>> EnvironConfig, but instead have it in a separate document. That also 
>>>>>> makes
>>>>>> it easy to watch for just this value changing.
>>>>>>
>>>>>
>>>>> SGTM. I've got no strong opinion on this.
>>>>>
>>>>>
>>>>>>
>>>>>> Potentially, I would decouple the value in this key from the actual
>>>>>> agent versions. Otherwise you do null DB schema upgrades on every minor
>>>>>> release. Maybe that's sane, but it *feels* like they are too separate
>>>>>> issues. (what is the version of the DB schema is orthogonal to what 
>>>>>> version
>>>>>> of the code I'm running.) It may be that the clarity and simplification 
>>>>>> of
>>>>>> just one version wins out.
>>>>>>
>>>>>
>>>>> I think it makes sense to just use the Juju version for the DB schema
>>>>> version. When you think about it, the DB schema is actually quite tightly
>>>>> coupled to the code version so why introduce another set of numbers to
>>>>> track? I'm thinking that if there's no schema upgrade steps required for a
>>>>> software given version

Re: Schema migration process

2014-06-12 Thread Menno Smits

You go to sleep for one night and they change everything ... :)

I've just caught up with the new backup system proposal. This changes the
schema migrations design a bit - I'll update the doc.


On 12 June 2014 21:33, John Meinel  wrote:

> If I read the conversations on IRC, they were talking about changing the
> backup to be just a POST to an HTTP endpoint, and you get back the contents
> of the DB, which would be deleted when the backup completes. Though you
> could still probably use whatever internal helpers spool the data to a temp
> location to do the same for backup & restore.
>
>
> On Thu, Jun 12, 2014 at 8:40 AM, Menno Smits 
> wrote:
>
>> I've updated the schema migration document with the ideas that have come
>> up in recent discussions.The scope of the schema migrations work has been
>> reduced somewhat by making the upgrade step Apply/Rollback concept a
>> separate project (database changes can be rolled back through the use of
>> mongobackup/restore).
>>
>> I've raised a few issues in the comments about handling various failure
>> modes. Input would be greatly appreciated.
>>
>> Nate: it would be good for you to have a look at this because we're
>> planning on leaning on the new backup functionality quite a bit. Let me
>> know if anything I'm proposing isn't compatible with what your team is
>> working on.
>>
>>
>> https://docs.google.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>>
>> On 6 June 2014 13:18, Menno Smits  wrote:
>>
>>> After some fruitful discussions, Tim and I have come up with something
>>> that I think is starting to look pretty good. There's a significant change
>>> to how we handle backups and rollbacks that seems like the right direction.
>>> I've tried to capture it all in a Google Doc as this email thread is
>>> starting to get impractical. Feel free to add comments and edit.
>>>
>>>
>>> https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing
>>>
>>>
>>> On 3 June 2014 13:34, Menno Smits  wrote:
>>>
>>>> On 30 May 2014 01:47, John Meinel  wrote:
>>>>
>>>>>
>>>>>
>>>>>> Building on John's thoughts, and adding Tim's and mine, here's what
>>>>>> I've got so far::
>>>>>>
>>>>>> - Introduce a "database-version" key into the EnvironConfig document
>>>>>> which tracks the Juju version that the database schema matches. More on
>>>>>> this later.
>>>>>>
>>>>>
>>>>> For clarity, I would probably avoid putting this key into
>>>>> EnvironConfig, but instead have it in a separate document. That also makes
>>>>> it easy to watch for just this value changing.
>>>>>
>>>>
>>>> SGTM. I've got no strong opinion on this.
>>>>
>>>>
>>>>>
>>>>> Potentially, I would decouple the value in this key from the actual
>>>>> agent versions. Otherwise you do null DB schema upgrades on every minor
>>>>> release. Maybe that's sane, but it *feels* like they are too separate
>>>>> issues. (what is the version of the DB schema is orthogonal to what 
>>>>> version
>>>>> of the code I'm running.) It may be that the clarity and simplification of
>>>>> just one version wins out.
>>>>>
>>>>
>>>> I think it makes sense to just use the Juju version for the DB schema
>>>> version. When you think about it, the DB schema is actually quite tightly
>>>> coupled to the code version so why introduce another set of numbers to
>>>> track? I'm thinking that if there's no schema upgrade steps required for a
>>>> software given version then the DB is left alone except that the schema
>>>> version number gets bumped.
>>>>
>>>>
>>>>> - Introduce a MasterStateServer upgrade target which marks upgrade
>>>>>> steps which are only to run on the master state server. Also more below.
>>>>>>
>>>>>
>>>>> This is just a compiled-in list of steps to apply, right?
>>>>>
>>>>
>>>> Yes. I was thinking that schema upgrade steps would be defined in the
>>>> same place and way that other upgrade steps are currently defined so that
&g

Re: How to show diff for a rev?

2014-06-18 Thread Menno Smits

Try the poorly documented -m option to git show, like this:

git show -m 

For 7360086, this gives exactly the same output as git
diff 7360086^ 7360086.

For 348c104, the output is almost the same as "git diff 348c104^ 348c104"
except there's some additional hunks at the bottom which I haven't spent
much time figuring out where they're from. The additional bits certainly
look related to the change.

On 18 June 2014 23:37, Martin Packman  wrote:

> On 18/06/2014, John Meinel  wrote:
> >
> > So the only syntax that reliably gives me what I want is:
> >  git dif 348c104^ 348c104
> > I was hoping there would be a better shortcut for it. Does anyone have
> some
> > more voodoo that I could use to avoid having to type the same thing
> twice?
>
> That's what I've always done. Often have shas (or sha heads) on my
> clipboard...
>
> Seems like you could do something like this though:
>
> $ git config --global alias.d '!sh -c "git diff $1^ $1" -'
> $ git d 348c104
>
> Martin
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

panics in apiserver/root.go

2014-06-22 Thread Menno Smits

There's been a few panics during test runs in the API server code recently
that look like this:

http://juju-ci.vapour.ws:8080/job/github-merge-juju/221/consoleFull
http://juju-ci.vapour.ws:8080/job/github-merge-juju/223/consoleFull

I've also seen it happen during test runs on my personal machine today
(once out of 10's of runs of the same test suite). Given where the panic is
occurring could it be related to the recent API versioning changes?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: panics in apiserver/root.go

2014-06-22 Thread Menno Smits

Talk about quick turn-around! I'm glad you've (probably) found it.



On 23 June 2014 17:32, John Meinel  wrote:

> I think I figured it out, though I'm not 100% sure. I believe we're
> accessing a go map without a mutex, so you can get concurrent access. I'll
> see if I can create a test case that triggers the race detector and fix it.
>
> John
> =:->
>
>
> On Mon, Jun 23, 2014 at 9:24 AM, John Meinel 
> wrote:
>
>> I have not seen them before, but it certainly is likely to be related,
>> I'll dig into it today.
>>
>> John
>> =:->
>>
>>
>> On Mon, Jun 23, 2014 at 8:21 AM, Menno Smits 
>> wrote:
>>
>>>  There's been a few panics during test runs in the API server code
>>> recently that look like this:
>>>
>>> http://juju-ci.vapour.ws:8080/job/github-merge-juju/221/consoleFull
>>> http://juju-ci.vapour.ws:8080/job/github-merge-juju/223/consoleFull
>>>
>>> I've also seen it happen during test runs on my personal machine today
>>> (once out of 10's of runs of the same test suite). Given where the panic is
>>> occurring could it be related to the recent API versioning changes?
>>>
>>> - Menno
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Thoughts to keep in mind for Code Review

2014-06-25 Thread Menno Smits

I completely agree with Ian's point about code needing to be
self-explanatory and stand on it's own.

That said, the article mentions that the process of creating review
annotations encourages the author to review their own work in a way that
they may have not done otherwise, eliminating problems before anyone else
even looks at the code. Effectively, a phase of self review happens before
peer review driving the defect rate down and probably making the peer
review more efficient.




On 25 June 2014 19:43, roger peppe  wrote:

> That's very interesting; thanks for the link.
>
> One part that jumps out at me:
>
> : ... many teams that review code don't have a good way of tracking
> defects found
> : during review and ensuring that bugs are actually fixed before the
> review is complete
>
> We don't have this, and with github reviews it's now even harder (for
> reviewer
> or coder) to verify this by going back to try to correlate code review
> remarks with
> changes made.
>
> About pre-review annotations, I agree with Ian that the code should be
> documented
> well enough that someone coming to it from scratch can understand it, but
> I also wonder if there is a room for review-specific comments, talking
> about
> reasons for the changes themselves in the specific context of that review.
> The code review does not show all the code in the system, and the changes
> made will often be made with respect to other areas of the code base - I
> can see
> how a guide to how the changes fit within the context of the whole system,
> and *why* we're making the changes themselves, rather than how the changed
> code functions when the changes are made, might be a useful thing.
>
>   cheers,
> rog.
>
>
> On 25 June 2014 05:20, John Meinel  wrote:
> > An interesting article from IBM:
> >
> http://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/
> >
> > There is a pretty strong bias for "we found these results and look at how
> > our tool makes it easier to follow these guidelines", but the core
> results
> > are actually pretty good.
> >
> > I certainly recommend reading it and keeping some of it in mind while
> you're
> > both coding and reviewing. (Particularly how long should code review
> take,
> > and how much code should be put up for review at a time.)
> > Another trick that we haven't made much use of is to annotate the code we
> > put up for review. We have the summary description, but you can certainly
> > put some inline comments on your own proposal if you want to highlight
> areas
> > more clearly.
> >
> > John
> > =:->
> >
> > --
> > Juju-dev mailing list
> > Juju-dev@lists.ubuntu.com
> > Modify settings or unsubscribe at:
> > https://lists.ubuntu.com/mailman/listinfo/juju-dev
> >
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Critical bugs for 1.20.

2014-06-26 Thread Menno Smits

I'm currently taking a look at #1334273. It could be related to the
"restricted API logins during upgrades" change I made.


On 27 June 2014 06:00, Curtis Hovey-Canonical  wrote:

> We are tracking a list of critical bugs that must be fixed to release 1.20.
> https://launchpad.net/juju-core/+milestone/1.20.0
>
> I intend to create a 1.20 branch in
> https://github.com/juju/juju
> that we can merge fixes into. Actually I have done that, but github
> doesn't want me to merge by version branch...I need to tinker with the
> issue to resolve it.
>
> I think the 1.20 branch needs to be version 1.20.0
> I propose we set the master branch to 1.21-alpha1 now so that
> development can continue to add new features.
>
> --
> Curtis Hovey
> Canonical Cloud Development and Operations
> http://launchpad.net/~sinzui
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Critical bugs for 1.20.

2014-06-29 Thread Menno Smits

thumper, axw and I spent quite a bit of time on this on Friday and think we
know what's happening (nothing to do with the API logins change). The
potential fix is here:

https://github.com/juju/juju/pull/188


On 27 June 2014 12:22, Menno Smits  wrote:

> I'm currently taking a look at #1334273. It could be related to the
> "restricted API logins during upgrades" change I made.
>
>
> On 27 June 2014 06:00, Curtis Hovey-Canonical 
> wrote:
>
>> We are tracking a list of critical bugs that must be fixed to release
>> 1.20.
>> https://launchpad.net/juju-core/+milestone/1.20.0
>>
>> I intend to create a 1.20 branch in
>> https://github.com/juju/juju
>> that we can merge fixes into. Actually I have done that, but github
>> doesn't want me to merge by version branch...I need to tinker with the
>> issue to resolve it.
>>
>> I think the 1.20 branch needs to be version 1.20.0
>> I propose we set the master branch to 1.21-alpha1 now so that
>> development can continue to add new features.
>>
>> --
>> Curtis Hovey
>> Canonical Cloud Development and Operations
>> http://launchpad.net/~sinzui
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Dropbox has open sourced some of their Go libraries

2014-07-07 Thread Menno Smits

Hey look - another errors package! */me ducks*

https://tech.dropbox.com/2014/07/open-sourcing-our-go-libraries/
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Current handling of failed upgrades is screwy

2014-07-10 Thread Menno Smits

So I've noticed that the way we currently handle failed upgrades in the
machine agent doesn't make a lot of sense.

Looking at cmd/jujud/machine.go:821, an error is created if
PerformUpgrade() fails but nothing is ever done with it. It's not returned
and it's not logged. This means that if upgrade steps fail, the agent
continues running with the new software version, probably with partially
applied upgrade steps, and there is no way to know.

I have a unit tested fix ready which causes the machine agent to exit (by
returning the error as a fatalError) if PerformUpgrade fails but before
proposing I realised that's not the right thing to do. The agent's upstart
script will restart the agent and probably cause the upgrade to run and
fail again so we end up with an endless restart loop.

The error could also be returned as a "non-fatal" (to the runner) error but
that will just cause the upgrade-steps worker to continuously restart,
attempting the upgrade and failing.

Another approach could be to set the global agent-version back to the
previous software version before killing the machine agent but other agents
may have already upgraded and we can't currently roll them back in any
reliable way.

Our upgrade story will be improving in the coming weeks (I'm working on
that). In the mean time what should we do?

Perhaps the safest thing to do is just log the error and keep the agent
running the new version and hope for the best? There is a significant
chance of problems but this is basically what we're doing now (except
without logging that there's a problem).

Does anyone have a better idea?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Current handling of failed upgrades is screwy

2014-07-13 Thread Menno Smits

On 10 July 2014 20:57, John Meinel  wrote:

> I think it fundamentally comes down to "is the reason upgrade failed
> transient or permanent", if we can try again later, do so, else log at
> Error level, and keep on with your life, because that is the only chance of
> recovery (from what you've said, at least).
>

This is a good approach but I don't see any way that the machine agent can
know if an error is transient or permanent with any certainty.

Tim has contributed some useful guidance. Given that we currently have no
reliable way of automatically rolling back upgrades, we should aim to just
stay on the new software version (this is what we silently do now anyway).
Instead of stopping on the first failed upgrade step, all upgrade steps
should be attempted with all upgrade step failures logged, the thinking
being that the environment is more likely to be operational the more
upgrade steps that have run.

This approach will also be used for the upcoming upgrade changes when
backups aren't available (i.e. when upgrading from a version that doesn't
support the backup API). If backups are available then upgrades will be
aborted after the first failure with the backup being used to roll back any
changes that may have been made.





>
> John
> =:->
>
>
> On Thu, Jul 10, 2014 at 11:18 AM, Menno Smits 
> wrote:
>
>> So I've noticed that the way we currently handle failed upgrades in the
>> machine agent doesn't make a lot of sense.
>>
>> Looking at cmd/jujud/machine.go:821, an error is created if
>> PerformUpgrade() fails but nothing is ever done with it. It's not returned
>> and it's not logged. This means that if upgrade steps fail, the agent
>> continues running with the new software version, probably with partially
>> applied upgrade steps, and there is no way to know.
>>
>> I have a unit tested fix ready which causes the machine agent to exit (by
>> returning the error as a fatalError) if PerformUpgrade fails but before
>> proposing I realised that's not the right thing to do. The agent's upstart
>> script will restart the agent and probably cause the upgrade to run and
>> fail again so we end up with an endless restart loop.
>>
>> The error could also be returned as a "non-fatal" (to the runner) error
>> but that will just cause the upgrade-steps worker to continuously restart,
>> attempting the upgrade and failing.
>>
>> Another approach could be to set the global agent-version back to the
>> previous software version before killing the machine agent but other agents
>> may have already upgraded and we can't currently roll them back in any
>> reliable way.
>>
>> Our upgrade story will be improving in the coming weeks (I'm working on
>> that). In the mean time what should we do?
>>
>> Perhaps the safest thing to do is just log the error and keep the agent
>> running the new version and hope for the best? There is a significant
>> chance of problems but this is basically what we're doing now (except
>> without logging that there's a problem).
>>
>> Does anyone have a better idea?
>>
>> - Menno
>>
>>
>>
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Current handling of failed upgrades is screwy

2014-07-15 Thread Menno Smits

OK - points taken.

So taking your ideas and extending them a little, I'm thinking:

   - retry upgrade steps on failure (with inter-attempt delay)
   - indicate when there's upgrade problems by setting the machine agent
   status
   - if despite the retries the upgrade won't complete, report this in
   status and keep the agent running but with the restricted API in place and
   most workers not complete (i.e. as if the upgrade is still running). This
   allows "juju status" and "juju ssh" to work unless there's a significant
   upgrade step that hasn't run that prevents them from working.

Does that sound reasonable?




On 15 July 2014 19:33, William Reade  wrote:

> FWIW, we could set some error status on the affected agent (so users can
> see there's a problem) and make it return 0 (so that upstart doesn't keep
> hammering it); but as jam points out that's not helpful when it's a
> transient error. I'd suggest retrying a few times, with some delay between
> attempts, before we do so (although reporting the error, and making it
> clear that we'll retry automatically, is probably worthwhile).
>
> And, really, I'm not very keen on the prospect of continuing to run when
> we know upgrade steps have failed -- IMO this puts us in an essentially
> unknowable state, and I'd much rather fail hard and early than limp along
> pretending to work correctly. Manual recovery of a failed upgrade will
> surely be tedious whatever we do, but a failed upgrade won't affect the
> operation of properly-written charms -- it's a management failure, so you
> can't scale/relate/whatever, but the actual software deployed will keep
> running. However, I can easily imagine that continuing to run juju agents
> against truly broken state could lead to services actually being shut
> down/misconfigured, and I think that's much more harmful.
>
> Cheers
> William
>
>
> On Thu, Jul 10, 2014 at 9:57 AM, John Meinel 
> wrote:
>
>> I think it fundamentally comes down to "is the reason upgrade failed
>> transient or permanent", if we can try again later, do so, else log at
>> Error level, and keep on with your life, because that is the only chance of
>> recovery (from what you've said, at least).
>>
>> John
>> =:->
>>
>>
>> On Thu, Jul 10, 2014 at 11:18 AM, Menno Smits 
>> wrote:
>>
>>> So I've noticed that the way we currently handle failed upgrades in the
>>> machine agent doesn't make a lot of sense.
>>>
>>> Looking at cmd/jujud/machine.go:821, an error is created if
>>> PerformUpgrade() fails but nothing is ever done with it. It's not returned
>>> and it's not logged. This means that if upgrade steps fail, the agent
>>> continues running with the new software version, probably with partially
>>> applied upgrade steps, and there is no way to know.
>>>
>>> I have a unit tested fix ready which causes the machine agent to exit
>>> (by returning the error as a fatalError) if PerformUpgrade fails but before
>>> proposing I realised that's not the right thing to do. The agent's upstart
>>> script will restart the agent and probably cause the upgrade to run and
>>> fail again so we end up with an endless restart loop.
>>>
>>> The error could also be returned as a "non-fatal" (to the runner) error
>>> but that will just cause the upgrade-steps worker to continuously restart,
>>> attempting the upgrade and failing.
>>>
>>> Another approach could be to set the global agent-version back to the
>>> previous software version before killing the machine agent but other agents
>>> may have already upgraded and we can't currently roll them back in any
>>> reliable way.
>>>
>>> Our upgrade story will be improving in the coming weeks (I'm working on
>>> that). In the mean time what should we do?
>>>
>>> Perhaps the safest thing to do is just log the error and keep the agent
>>> running the new version and hope for the best? There is a significant
>>> chance of problems but this is basically what we're doing now (except
>>> without logging that there's a problem).
>>>
>>> Does anyone have a better idea?
>>>
>>> - Menno
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Current handling of failed upgrades is screwy

2014-07-16 Thread Menno Smits

On 16 July 2014 18:55, William Reade  wrote:

> On Wed, Jul 16, 2014 at 3:46 AM, Menno Smits 
> wrote:
>
>> OK - points taken.
>>
>> So taking your ideas and extending them a little, I'm thinking:
>>
>>- retry upgrade steps on failure (with inter-attempt delay)
>>- indicate when there's upgrade problems by setting the machine agent
>>status
>>- if despite the retries the upgrade won't complete, report this in
>>status and keep the agent running but with the restricted API in place and
>>most workers not complete (i.e. as if the upgrade is still running). This
>>allows "juju status" and "juju ssh" to work unless there's a significant
>>upgrade step that hasn't run that prevents them from working.
>>
>> Does that sound reasonable?
>>
>
> I think that's reasonable, yeah; only caveat is that if we *are* in a
> position to roll back the whole upgrade we should do so. For these
> measures, I'm thinking more about situations where the state server has
> successfully upgraded but some satellite agents have failed.
>

Absolutely. The above only applies when there is no backup available to
roll back to - i.e. when the version being upgraded from didn't support
storing backups server side. If there is a backup the machine agent will
roll back to the previous version if the upgrade process fails.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Current handling of failed upgrades is screwy

2014-07-16 Thread Menno Smits

On 16 July 2014 22:36, David Cheney  wrote:

> If that is the blocker. Can we introduce a major version which does
> not change the schema at all. Then we know that everyone running Juju
> has a functional backup system. As I understand it, we require people
> to upgrade in order, without skipping versions.
>

Doing a stable release for exactly this reason (sooner than it might
otherwise happen) has already been discussed in some conversations.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

log spam related to /etc/network/interfaces

2014-07-16 Thread Menno Smits

I've just noticed that Juju is currently repeating the following every 3
seconds on my dev machine when using the local provider:

machine-0: 2014-07-17 05:12:32 ERROR juju.worker runner.go:218 exited
"networker": open /etc/network/interfaces: no such file or directory

/etc/network/interfaces does indeed not exist on my laptop (NetworkManager
takes care of networking).

Seems like something needs to be changed to account for this case. Any
ideas?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: backup API in 1.20

2014-07-29 Thread Menno Smits

>
>
> But I'm a bit suspicious... would someone please confirm that we don't
> have *any* released clients that use the POST API? The above is predicated
> on that assumption.
>

The juju command line client hasn't yet seen changes to call the backup
POST API. The work was seemingly being done "server first, then client"
(which seem logical) and the client work didn't make the release (and isn't
done even today).

IMHO, Eric's proposal seems totally fine to me.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

help please: mongo/mgo panic

2014-07-29 Thread Menno Smits

All,

Various people have been seeing the machine agents panic with the following
message:

   panic: rescanned document misses transaction in queue

The error message comes from mgo but the actual cause is unknown. There's
plenty of detail in the comments for the LP bug that's tracking this. If
you have any ideas about a possible cause or how to debug this further
please weigh in.

https://bugs.launchpad.net/juju-core/+bug/1318366

Thanks,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Please use gopkg.in for importing mgo

2014-07-31 Thread Menno Smits

Trunk is currently broken if building using a clean GOPATH because
revision 03e56dcd was recently merged which imports mgo from labix.org
instead of gopkg.in. We no longer use mgo from labix.org and godeps no
longer installs it from that location.

The following import paths should be used instead:

gopkg.in/mgo.v2
gopkg.in/mgo.v2/bson
gopkg.in/mgo.v2/txn

This was perhaps not publicised well enough but the switch was made a
couple of weeks ago.

Right now juju will only build on machines that incidentally have a
labix.org mgo install. If the machine doesn't already have it, godeps won't
install it and builds fail.

I imagine the problem revision got past the landing bot because our test
hosts still have the labix.org mgo installed. If so, this should be cleaned
up.

I will fix the problem imports in the upgrades package now.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Please use gopkg.in for importing mgo

2014-07-31 Thread Menno Smits

On 1 August 2014 12:09, Ian Booth  wrote:

> > The following import paths should be used instead:
> >
> > gopkg.in/mgo.v2
> > gopkg.in/mgo.v2/bson
> > gopkg.in/mgo.v2/txn
> >
> > This was perhaps not publicised well enough but the switch was made a
> > couple of weeks ago.
> >
>
> FWIW, the switch was only done yesterday in order to pick up the new mgo
> driver
> version needed to fix a mongo panic, and the changes were only done on
> Github
> and not Launchpad, hence the import path change.
>

Not quite. Trunk has been using mgo from gopkg.in since July 28. A similar
change was done on the 1.20 branch yesterday so that we could get the
recent mgo fix there.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: determine if juju is upgrading

2014-08-05 Thread Menno Smits

In recent versions of Juju (i.e. post 1.20) the "agent-state-info" field
for each machine agent in the status output will say "upgrading to
" while the upgrade is in progress. This is could be used by tests
to know when the upgrade is finished.




On 6 August 2014 05:40, Horacio Duran  wrote:

> Hey, I have been running several CI tests lately and find very often the
> following error:
> 2014-08-04 22:27:42 ERROR juju.cmd supercommand.go:323 upgrade in progress
> -
> At least when my machine is not under heavy load and I am at decent
> network reach of amazon.
> I wonder, is there a way to poll juju to know if its upgrading?
>
> I think it is a bit drastic having CI tests fail for this (unless the test
> checks is related to upgrade in some form) I believe that being able to do
> the check and have the CI test check with a timeout before proceeding with
> the rest of the tests would yield much cleaner results when we need to
> review CI runs that failed.
>
> Cheers.
> --
> Horacio.
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

avoiding unnecessarily entering "upgrade mode"

2014-08-05 Thread Menno Smits

Right now, a Juju machine agent is in "upgrade mode" from the moment it
starts until the upgrade-steps worker is finished. During this period API
logins are heavily restricted and most of the agent's workers don't start
until upgrade mode stops.

This happens even when there is no upgrade to perform. The upgrade-steps
worker always runs at machine agent startup and upgrade mode is in force
until it finishes.

Upgrade mode is typically short-lived (say 10 seconds) but if something is
wrong (e.g. mongo issues) the upgrade-steps worker may take longer or not
finish resulting in the user seeing lots of "upgrade in progress" messages
from the client and in the logs.
This is particularly confusing when a user hasn't even requested an upgrade
themselves.

I would like to change the machine agent so that upgrade mode is only
entered if the version in agent.conf is different from the running software
version. This would mean that upgrade mode is only entered if there is an
actual upgrade to perform.

The version in agent.conf is only updated after a successful upgrade so it
is the right thing to use to determine if upgrade mode should be entered.

The current behaviour means that the (idempotent) upgrade steps for the
current version are always run each time the machine agent starts. If the
change I'm proposing is implemented this will no longer happen. Does this
seem like a problem to anyone? For instance, do we rely on the upgrade
steps for the current version being run after bootstrap?

The ticket for this work is at: https://bugs.launchpad.net/bugs/1350111

Cheers,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: issuing a jujud restart

2014-08-10 Thread Menno Smits

How this happens is slightly complex but the short answer is that if any of
jujud's workers exit with a fatalError (as defined in cmd/jujud/agent.go),
then jujud will terminate (and then be restarted by upstart).

I'm not sure how you should propagate the need to exit from the restore API
call through to the worker but I'm sure that's doable.

HTH,
Menno

On 9 August 2014 11:32, Horacio Duran  wrote:

> Hey, I am currently working on restore command for juju state servers. I
> find myself in the need, after putting all the back-up parts in place, of
> restarting jujud, from within jujud. Can anyone shed some light on how to
> do that?
> Thank you all.
> Cheers
> --
> Horacio Durán
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: avoiding unnecessarily entering "upgrade mode"

2014-08-10 Thread Menno Smits

On 6 August 2014 18:32, John Meinel  wrote:

> That sounds like what I would have expected was happening (we only run
> upgrade steps if we think they have a reason to change things, and then
> only set the last known version once all the upgrade steps have finished.)
>

> I'm concerned about the "if there is a mongo problem", because if we're
> running into problems for things we think are idempotent, then we might
> have a genuine problem with them.
>

Sorry, I wasn't clear. I didn't mean to imply there were any problems with
existing upgrade steps. I was referring to the recent mongo replicaset
problems we've been having - they were sometimes preventing the
upgrade-steps worker from finishing so we never came out of "upgrade mode"
causing many "upgrade in progress" messages to be spewed to the logs.

By making the change I'm proposing, we can avoid the unnecessary and
confusing log noise in the case where no upgrade needs to be performed.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: avoiding unnecessarily entering "upgrade mode"

2014-08-10 Thread Menno Smits

On 6 August 2014 19:09, Dimiter Naydenov 
wrote:

> > I would like to change the machine agent so that upgrade mode is
> > only entered if the version in agent.conf is different from the
> > running software version. This would mean that upgrade mode is
> > only entered if there is an actual upgrade to perform.
> If you are referring to Upgraded-To version field in agent config, I
> think this is set after the upgrade completes, so it might be
> unavailable before that.
>

I think this should be OK. The upgradedToVersion field in an agent.conf
will be set to the last version for which upgrade steps were successfully
performed. If this is the same as version.Current then we know there will
be no upgrade to perform. If they are different, or if upgradedToVersion is
not set, then we know there may be upgrade steps to perform.

Thanks,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: avoiding unnecessarily entering "upgrade mode"

2014-08-10 Thread Menno Smits

On 7 August 2014 03:13, William Reade  wrote:

> SGTM too. It should always have worked like this -- rerunning all our
> upgrade steps every time is *crazy*.
>
>
Sorry, I got this detail wrong. The machine agent isn't rerunning upgrade
steps unnecessarily, but it is waiting for an unnecessarily long time
before the deciding on whether upgrades are required. This is what I'm
fixing (today).

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: issuing a jujud restart

2014-08-10 Thread Menno Smits

On 11 August 2014 13:48, Andrew Wilkins 
wrote:

> On Mon, Aug 11, 2014 at 5:41 AM, Menno Smits 
> wrote:
>
>> How this happens is slightly complex but the short answer is that if any
>> of jujud's workers exit with a fatalError (as defined in
>> cmd/jujud/agent.go), then jujud will terminate (and then be restarted by
>> upstart).
>>
>> I'm not sure how you should propagate the need to exit from the restore
>> API call through to the worker but I'm sure that's doable.
>>
>
> We currently have ErrTerminateAgent, which signals that the agent should
> uninstall itself and then terminate. We probably just want an
> ErrRestartAgent which exits the process; upstart will restart it.
>
>
I believe this is already what will happen now if a worker returns
fatalError but creating a ErrRestartAgent and handling it appropriately
would probably be cleaner.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Intentionally introducing failures into Juju

2014-08-13 Thread Menno Smits

There's been some discussion recently about adding some feature to Juju to
allow developers or CI tests to intentionally trigger otherwise hard to
induce failures in specific parts of Juju. The idea is that sometimes we
need some kind of failure to happen in a CI test or when manually testing
but those failures can often be hard to make happen.

For example, for changes Juju's upgrade mechanics that I'm working on at
the moment I would like to ensure that an upgrade is cleanly aborted if one
of the state servers in a HA environment refuses to start the upgrade. This
logic is well unit tested but there's nothing like seeing it actually work
in a real environment to build confidence - however, it isn't easy to make
a state server misbehave in this way.

To help with this kind of testing scenario, I've created a new top-level
package called "wrench" which lets us "drop a wrench in the works" so to
speak. It's very simple with one main API which can be called from
judiciously chosen points in Juju's execution to decide whether some
failure should be triggered.

The module looks for files in $jujudatadir/wrench (typically
/var/lib/juju/wrench) on the local machine. If I wanted to trigger the
upgrade failure described above I could drop a file in that directory on
one of the state servers named say "machine-agent" with the content:

refuse-upgrade

Then in some part of jujud's upgrade code there could be a check like:

if wrench.IsActive("machine-agent", "refuse-upgrade") {
 // trigger the failure
}

The idea is this check would be left in the code to aid CI tests and future
manual tests.

You can see the incomplete wrench package here:
https://github.com/juju/juju/pull/508

There are a few issues to nut out.

1. It needs to be difficult/impossible for someone to accidentally or
maliciously activate this feature, especially in production environments. I
have almost finished (but not pushed to Github) some changes to the wrench
package which make it strict about the ownership and permissions on the
wrench files. This should make it harder for the wrong person to drop files
in to the wrench directory.

The idea has also been floated to only enable this functionality in
non-stable builds. This certainly gives a good level of protection but I'm
slightly wary of this approach because it makes it impossible for CI to
take advantage of the wrench feature when testing stable release builds.
I'm happy to be convinced that the benefit is worth the cost.

Other ideas on how to better handle this are very welcome.

2. The wrench functionality needs to be disabled during unit test runs
because we don't want any wrench files a developer may have lying around to
affect Juju's behaviour during test runs. The wrench package has a global
on/off switch so I plan on switching it off in BaseSuite's setup or similar.

3. The name is a bikeshedding magnet :)  Other names that have been bandied
about for this feature are "chaos" and "spanner". I don't care too much so
if there's a strong consensus for another name let's use that. I chose
"wrench" over "spanner" because I believe that's the more common usage in
the US and because Spanner is a DB from Google. Let's not get carried
away...

All comments, ideas and concerns welcome.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Menno Smits

I like the idea being able to trigger failures using the juju command line.

I'm undecided about how the need to fail should be stored. An obvious
location would be in a new collection managed by state, or even as a field
on existing state objects and documents. The downside of this approach is
that a connection to state will then need to be available from where-ever
we would like failures to be triggered - this isn't always possible or
convenient.

Another approach would be to have "juju inject-failure" drop files in some
location (along the lines of what I've already implemented) using SSH. This
has the advantage of making the failure checks easy to perform from
anywhere with the disadvantage of making it more difficult to manage
existing failures. There would also be some added complexity when creating
failure files for about-to-be-created entities (e.g. the "juju deploy
--inject-failure" case).

Do you have any thoughts on this?




On 14 August 2014 02:25, Gustavo Niemeyer 
wrote:

> That's a nice direction, Menno.
>
> The main thing that comes to mind is that it sounds quite inconvenient
> to turn the feature on. It may sound otherwise because it's so easy to
> drop files at arbitrary places in our local machines, but when dealing
> with a distributed system that knows how to spawn its own resources
> up, suddenly the "just write a file" becomes surprisingly boring and
> race prone.
>
> What about:
>
> juju inject-failure [--unit=unit] [--service=service] "?
> juju deploy [--inject-failure=name] ...
>
>
>
> On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits 
> wrote:
> > There's been some discussion recently about adding some feature to Juju
> to
> > allow developers or CI tests to intentionally trigger otherwise hard to
> > induce failures in specific parts of Juju. The idea is that sometimes we
> > need some kind of failure to happen in a CI test or when manually testing
> > but those failures can often be hard to make happen.
> >
> > For example, for changes Juju's upgrade mechanics that I'm working on at
> the
> > moment I would like to ensure that an upgrade is cleanly aborted if one
> of
> > the state servers in a HA environment refuses to start the upgrade. This
> > logic is well unit tested but there's nothing like seeing it actually
> work
> > in a real environment to build confidence - however, it isn't easy to
> make a
> > state server misbehave in this way.
> >
> > To help with this kind of testing scenario, I've created a new top-level
> > package called "wrench" which lets us "drop a wrench in the works" so to
> > speak. It's very simple with one main API which can be called from
> > judiciously chosen points in Juju's execution to decide whether some
> failure
> > should be triggered.
> >
> > The module looks for files in $jujudatadir/wrench (typically
> > /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
> > upgrade failure described above I could drop a file in that directory on
> one
> > of the state servers named say "machine-agent" with the content:
> >
> > refuse-upgrade
> >
> > Then in some part of jujud's upgrade code there could be a check like:
> >
> > if wrench.IsActive("machine-agent", "refuse-upgrade") {
> >  // trigger the failure
> > }
> >
> > The idea is this check would be left in the code to aid CI tests and
> future
> > manual tests.
> >
> > You can see the incomplete wrench package here:
> > https://github.com/juju/juju/pull/508
> >
> > There are a few issues to nut out.
> >
> > 1. It needs to be difficult/impossible for someone to accidentally or
> > maliciously activate this feature, especially in production
> environments. I
> > have almost finished (but not pushed to Github) some changes to the
> wrench
> > package which make it strict about the ownership and permissions on the
> > wrench files. This should make it harder for the wrong person to drop
> files
> > in to the wrench directory.
> >
> > The idea has also been floated to only enable this functionality in
> > non-stable builds. This certainly gives a good level of protection but
> I'm
> > slightly wary of this approach because it makes it impossible for CI to
> take
> > advantage of the wrench feature when testing stable release builds. I'm
> > happy to be convinced that the benefit is worth the cost.
> >
> > Other ideas on how to better handle this are very welcome.
> >
> > 2. The wrench func

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Menno Smits

I like the idea of being able to trigger failures stochastically. I'll
integrate this into whatever we settle on for Juju's failure injection.


On 14 August 2014 02:29, Gustavo Niemeyer 
wrote:

> Ah, and one more thing: when developing the chaos-injection mechanism
> in the mgo/txn package, I also added both a "chance" parameter for
> either killing or slowing down a given breakpoint. It sounds like it
> would be useful for juju's mechanism too. If you kill every time, it's
> hard to tell whether the system would know how to retry properly.
> Killing or slowing down just sometimes, or perhaps the first 2 times
> out of every 3, for example, would enable the system to recover
> itself, and an external agent to ensure it continues to work properly.
>
> On Wed, Aug 13, 2014 at 11:25 AM, Gustavo Niemeyer
>  wrote:
> > That's a nice direction, Menno.
> >
> > The main thing that comes to mind is that it sounds quite inconvenient
> > to turn the feature on. It may sound otherwise because it's so easy to
> > drop files at arbitrary places in our local machines, but when dealing
> > with a distributed system that knows how to spawn its own resources
> > up, suddenly the "just write a file" becomes surprisingly boring and
> > race prone.
> >
> > What about:
> >
> > juju inject-failure [--unit=unit] [--service=service]  name>"?
> > juju deploy [--inject-failure=name] ...
> >
> >
> >
> > On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits 
> wrote:
> >> There's been some discussion recently about adding some feature to Juju
> to
> >> allow developers or CI tests to intentionally trigger otherwise hard to
> >> induce failures in specific parts of Juju. The idea is that sometimes we
> >> need some kind of failure to happen in a CI test or when manually
> testing
> >> but those failures can often be hard to make happen.
> >>
> >> For example, for changes Juju's upgrade mechanics that I'm working on
> at the
> >> moment I would like to ensure that an upgrade is cleanly aborted if one
> of
> >> the state servers in a HA environment refuses to start the upgrade. This
> >> logic is well unit tested but there's nothing like seeing it actually
> work
> >> in a real environment to build confidence - however, it isn't easy to
> make a
> >> state server misbehave in this way.
> >>
> >> To help with this kind of testing scenario, I've created a new top-level
> >> package called "wrench" which lets us "drop a wrench in the works" so to
> >> speak. It's very simple with one main API which can be called from
> >> judiciously chosen points in Juju's execution to decide whether some
> failure
> >> should be triggered.
> >>
> >> The module looks for files in $jujudatadir/wrench (typically
> >> /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
> >> upgrade failure described above I could drop a file in that directory
> on one
> >> of the state servers named say "machine-agent" with the content:
> >>
> >> refuse-upgrade
> >>
> >> Then in some part of jujud's upgrade code there could be a check like:
> >>
> >> if wrench.IsActive("machine-agent", "refuse-upgrade") {
> >>  // trigger the failure
> >> }
> >>
> >> The idea is this check would be left in the code to aid CI tests and
> future
> >> manual tests.
> >>
> >> You can see the incomplete wrench package here:
> >> https://github.com/juju/juju/pull/508
> >>
> >> There are a few issues to nut out.
> >>
> >> 1. It needs to be difficult/impossible for someone to accidentally or
> >> maliciously activate this feature, especially in production
> environments. I
> >> have almost finished (but not pushed to Github) some changes to the
> wrench
> >> package which make it strict about the ownership and permissions on the
> >> wrench files. This should make it harder for the wrong person to drop
> files
> >> in to the wrench directory.
> >>
> >> The idea has also been floated to only enable this functionality in
> >> non-stable builds. This certainly gives a good level of protection but
> I'm
> >> slightly wary of this approach because it makes it impossible for CI to
> take
> >> advantage of the wrench feature when testing stable release builds. I'm
> >> happy to be conv

Re: Landing bot restrictions

2014-08-18 Thread Menno Smits

Be aware that I've seen JFDI used at least a couple of times over recent
weeks on PRs that *were* related to fixing a CI blocker but the person was
in too much of a hurry to add the appropriate __fixes-N__ comment. That
accounts for at least some of the JFDI usage (but certainly not all of it).

At any rate, we should surely be preferring the "fixes" comment over "JFDI"
whenever a PR is related to a CI blocker fix.


On 18 Aug 2014 15:46, "John Meinel"  wrote:

> If it gets abused to much we could probably restrict the JODI flag to team
> owners, which would restrict it to the team leads.
>
> John
> =:->
> On Aug 18, 2014 1:57 AM, "Tim Penhey"  wrote:
>
>> Hi all,
>>
>> I'm sure everyone is aware of the landing bot changes that have gone in
>> from CI.  While these are painful now, they are helping us improve our
>> test passing percentages.
>>
>> There was a pressure release valve added, the "JFDI" flag, to override
>> the restrictions.
>>
>> I've seen this flag used more often than it really should.  Please don't
>> just add it because you think your branch is "really important". Please
>> consult with your team lead before adding this flag.
>>
>> Cheers,
>> Tim
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Fwd: Storage server temporarily offline

2014-08-18 Thread Menno Smits

An update from Github about the current outage.

-- Forwarded message --
From: John Greet (GitHub Staff) 
Date: 19 August 2014 12:37
Subject: Re: Storage server temporarily offline
To: Menno Smits 

Hi Menno,

Sorry about that, we're working on restoring access to the file server your
repository resides on. Status updates are being posted here:

https://status.github.com

If you're still unable to access the repository when our status is back to
green please let us know.

Thanks,
John
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Logger prefixes for api/apiserver are changing

2014-09-02 Thread Menno Smits

On 3 September 2014 15:00, John Meinel  wrote:

> ...
>>
>> There were two ideas proposed:
>>
>> loggo.LoggerForPackage("juju")
>>
>> which would walk up the path until it found a path element juju, and
>> construct the string that way. The other was do use a defined prefix:
>>
>
> The only problem here is that we have:
>   github.com/juju/juju/juju
> and
>   github.com/juju/juju/cmd/juju
>
> Which would confuse this algorithm. Though I like its brevity.
>

Ha! I was just about to hit send on an email saying exactly the same thing
:)

Also, when we talk about package paths we really mean the source tree path
right? Every in cmd/juju is in the "main" package but uses a logger named
"juju.cmd.juju". We can really use the real package name to set the logger
name.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Logger prefixes for api/apiserver are changing

2014-09-02 Thread Menno Smits

On 3 September 2014 15:07, Menno Smits  wrote:

>
>
> Also, when we talk about package paths we really mean the source tree path
> right? Every in cmd/juju is in the "main" package but uses a logger named
> "juju.cmd.juju". We can really use the real package name to set the logger
> name.
>

Urgh. s/can/can't/
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Github built-in side by side diffs finally exist

2014-09-08 Thread Menno Smits

Hooray! ... and just as we're about to move to Reviewboard :)


On 9 September 2014 03:27, Nate Finch  wrote:

> https://github.com/blog/1884-introducing-split-diffs
>
> Works a lot better than octosplit, too.
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Important Blocking Bug

2014-09-09 Thread Menno Smits

Given that I merged a big upgrade related change yesterday, this could be
me (although I did test it extensively manually). I'll take a look.



On 10 September 2014 09:01, Nate Finch  wrote:

> https://bugs.launchpad.net/juju-core/+bug/1367431
>
> Someone from the down under crew, please take a look at this, since the US
> is currently at EOD or nearly so.
>
> -Nate
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Important Blocking Bug

2014-09-09 Thread Menno Smits

I've looked into this a bit and the problem is due to one of the machines
being unable to download tools for the upgrade. This could be due to the
recent changes in tools storage.

Ian has now taken over.

On 10 September 2014 09:11, Menno Smits  wrote:

> Given that I merged a big upgrade related change yesterday, this could be
> me (although I did test it extensively manually). I'll take a look.
>
>
>
> On 10 September 2014 09:01, Nate Finch  wrote:
>
>> https://bugs.launchpad.net/juju-core/+bug/1367431
>>
>> Someone from the down under crew, please take a look at this, since the
>> US is currently at EOD or nearly so.
>>
>> -Nate
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: ReviewBoard and our workflow

2014-09-10 Thread Menno Smits

Thanks Eric! I've used Reviewboard at a previous job and I'm fairly sure
that it aligns better with the way the Juju Core team likes to work than
Github's review features.

Two questions:

1. Is this what we're supposed to be doing from right now?

2. I'm pretty sure some configuration of the rbt tool is required so that
it knows how to talk to the Reviewboard server. Is there a config file
available?



On 11 September 2014 03:58, Eric Snow  wrote:

> Steps for a review of a PR:
>
> 1. create pull request in github
> 2. run "rbt post" while at your branch to create a review request [1][2]
> 3. open the review request in your browser and "publish" it [3]
> 4. add a comment to the PR with a link to the review request
> 5. address reviews until you get a "Ship It!"
> 6. add a $$merge$$ comment to the PR
>
> Both github and ReviewBoard support various triggers/hooks and both
> have robust HTTP APIs.  So we should be able to automate those steps
> (e.g. PR -> review request, "ship it" -> $$merge$$).  However, I don't
> see that automation as a prerequisite for switching over to
> ReviewBoard.
>
> Updating an existing review request:
> 1. run "rbt post -u" (or the explicit "rbt post -r #")
> 2. open the review request in your browser and "publish" it [3]
>
> FYI, Reviewboard supports chaining review requests.  Run rbt post
> --parent .
>
> I'll be updating the contributing doc relative to ReviewBoard (i.e.
> with the above info) once we settle in with the new tool.
>
> -eric
>
> [1] Make sure your branch is based on upstream master.  Otherwise this
> will not work right.
> [2] Reviewboard links revision IDs to review requests.  So if you
> already have a review request for a particular revision (e.g. your
> branch), then "rbt post" will fail.  Use "rbt post -u" or "rbt post -r
> #" instead.
> [3] rbt post has some options you should consider using:
>   - automatically publish the review request: rbt post --publish (or -p)
>   - open a browser window with the new review request: rbt post --open (or
> -o)
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Fixed reviewer schedule

2014-09-11 Thread Menno Smits

A possibly dumb question: is there any way to see who is on for a given
day? I can only see my own days.

On 11 September 2014 20:14, Dimiter Naydenov  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 11.09.2014 06:12, Tim Penhey wrote:
> > Hi folks,
> >
> > Those of you who are reviewers should now have invites to your
> > bi-weekly review time.  This now occurs on the same day every two
> > weeks. I have tried to have the mentors on the day after the
> > mentees (or overlapping).  Also tried to spread out the different
> > timezones.  It will never be perfect, but hopefully this is better
> > now.
> >
> > Cheers, Tim
> >
>
> Great job Tim! No more poking into spreadsheets :)
>
> - --
> Dimiter Naydenov 
> juju-core team
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
>
> iQEcBAEBAgAGBQJUEVnWAAoJENzxV2TbLzHwxzMH/3cIDt9y4ESoXcm4Ige/9dB/
> ty6pkPEjD6ZA6Yteopj/bzqbTrXTkQwKm0UZAw31uqx04DbAMwzemJkjHCc5dpQH
> qElnA6dilFmDStb+yghQur25ucf0gjgsvoA7ADzFfvIpHgDB4nkRVvJRCCyI4Wu7
> 7/THnCRvl+VrtrY6FvzMAINkuCTqKZJ9232+Q6FUJsT0bi2b3Q8JVaonFNmZa3Bm
> jSdcvyABFMg48uvbdnfvW0KVGIg5V43YH8IoT9HxcW6ezjWqeU1lVW34xWQ1i8al
> JCru1IC071f/dg89th6sZe+uMXyWrluFAwV29FwQd0+vzenPDlRC/y9eAHqReWw=
> =nmlo
> -END PGP SIGNATURE-
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Fixed reviewer schedule

2014-09-11 Thread Menno Smits

I don't have a Juju Team calendar set up in my Canonical calendar. How do I
add it? Other newer team members might be in the same boat.

On 12 September 2014 11:28, Ian Booth  wrote:

> If you display the Juju Team calendar, you can see who's on what day, plus
> also
> who's on leave and other relevant things like that.
>
> On 12/09/14 09:24, Menno Smits wrote:
> > A possibly dumb question: is there any way to see who is on for a given
> > day? I can only see my own days.
> >
> > On 11 September 2014 20:14, Dimiter Naydenov <
> dimiter.nayde...@canonical.com
> >> wrote:
> >
> > On 11.09.2014 06:12, Tim Penhey wrote:
> >>>> Hi folks,
> >>>>
> >>>> Those of you who are reviewers should now have invites to your
> >>>> bi-weekly review time.  This now occurs on the same day every two
> >>>> weeks. I have tried to have the mentors on the day after the
> >>>> mentees (or overlapping).  Also tried to spread out the different
> >>>> timezones.  It will never be perfect, but hopefully this is better
> >>>> now.
> >>>>
> >>>> Cheers, Tim
> >>>>
> >
> > Great job Tim! No more poking into spreadsheets :)
> >
> >>
> >> --
> >> Juju-dev mailing list
> >> Juju-dev@lists.ubuntu.com
> >> Modify settings or unsubscribe at:
> >> https://lists.ubuntu.com/mailman/listinfo/juju-dev
> >>
> >
> >
> >
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: consensus on using separate *_test packages

2014-09-14 Thread Menno Smits

+1

On 13 September 2014 08:18, Gustavo Niemeyer  wrote:

> On Fri, Sep 12, 2014 at 3:16 PM, Nate Finch 
> wrote:
> > In the thread Eric pointed to, Brad Fitzpatrick (one of the core Go
> > developers) says they prefer to keep tests in the same package unless
> forced
> > to have them in a different package to avoid circular dependencies.  I
> like
> > that.
>
> Brad is a great guy, but defending a position because someone says so
> is not good reasoning, no matter who that is. I've provided a
> different point of view on that same thread, with reasoning, and had
> no counterarguments.
>
> > I have always thought that export_test was an anti-pattern that should
> only
> > be used as a last resort.  The main problem I have with export_test is
> that
> > it makes your tests lie.  Your tests call foo.Create, but package foo
> > doesn't actually export Create, it has a non-exported create method, that
> > happens to get exported in the export_test file.  This makes your tests
> more
> > confusing than if you just called "create" from an internal test.  That's
> > even aside from the point of the busywork it creates from having to write
> > the file in the first place.
>
> On the counter side, it greatly encourages you to keep your tests
> isolated from internal details, and forces you to make it very
> explicit when you're using non-public interfaces in your test, via a
> well known convention. Both are great real benefits, that easily
> outweigh the theoretical "lie" described.
>
> I'm not saying you should never use internal tests, though, but rather
> agreeing with people that posted before you on this thread.
>
> > One argument for export_test is that it gives you a canonical place to go
> > look for all the internal stuff that is getting used in tests... but I
> don't
> > actually think that's very valuable.  I'm not actually sure why it would
> be
> > important to see what internal functions are getting exported for use
> during
> > tests.
>
> The second part of that same paragraph answers the question with a
> counter example:
>
> >  And in theory, if you're writing extensive unit tests, almost all
> > the internal methods would get exported there... so it becomes just a
> huge
> > boilerplate file for no good reason.  If you really want to see what
> > functions are getting exercised during tests, use code coverage, it's
> built
> > into the go tool.
>
> This clearly describes one of the important reasons for the pattern.
> If every internal function is imported in tests, and assuming good
> code-writing practices for when to use a new function, it means the
> test is extremely tightly bound to the implementation.
>
> > I agree with Gustavo's point that tests can be good examples of how to
> use a
> > package.  And, in fact, Go supports example functions that are run like
> > tests, but also get displayed in generated docs.  These example tests can
> > exist in the package_test package to make them very accurate
> representations
> > of how to use the package.
>
> That was just one side I mentioned, in a long list of reasons, and in
> practice examples are pretty much always written well after the code
> is ready. How many public functions do you have in the juju code base?
>  How many of them are covered by those examples you mention?  How many
> of them are covered in tests?
>
> > Most tests that are written to test the functionality of a package are
> not
> > actually good examples of how to use the package.  Most tests are just
> > isolating one part of the logic and testing that.  No one using the
> package
> > for its intended purpose would ever do that.  This is doubly true of unit
> > tests, where you may just be testing the input and output of a single
> > internal function.
>
> That's very far from being true. My own tests exercise the logic I'm
> writing, with the API I designed and documented, and they reflect how
> people use that code. I commonly even copy & paste snippets out of my
> own tests into mailing lists as examples of API usage. Perhaps you
> write some sort of test that I'm not used to.
>
> > I think our current trend of doing most tests in an external package has
> > significantly contributed to the poor quality of our tests.  Because
> we're
> > running from outside the package, we generally only have access to the
> > exported API of the package, so we try to "make it work" by mocking out
> > large portions of the code in order to be able to call the external
> > function, rather than writing real unit tests that just test one small
> > portion of the internal functionality of the package.
>
> This insight fails to account for the fact that in practice "exported
> API" is precisely about the API that was made public in one particular
> isolated abstraction. Several of these "public APIs" are in fact very
> internal details to the implementation of juju, but that make sense in
> isolation. When you're testing these via their public API, it means
> you're actually exercising a

Re: ReviewBoard is now the official review tool for juju

2014-09-14 Thread Menno Smits

Eric,

Thanks for setting this up.

Firstly, in your email you mention "rbt pull" several times. I think you
mean "rbt post" right? I don't see a pull command documented anywhere.

I've run in to one issue so far. When I tried to get my first review in to
Reviewboard today it took me a long time to figure out how to get it to
generate the correct diff. After much gnashing of teeth I figured out that
"rbt post" generates a diff by comparing "origin/master" against the
current branch. This means that if you haven't updated your local master
branch recently *and pushed your local master branch to your personal fork
on Github* (this is the part I missed) you'll end up with diffs that
include lots of changes that have already been merged and have nothing to
do with your branch.

As things stand the workflow actually needs to be:

1. Ensure your feature branch is rebased against upstream/master
2. Create a pull request like normal via github.
3. Switch to your local master branch.
4. "git pull" to update master
5. "git push origin master" to update your personal master on github.
5. Switch back to your feature branch ("git checkout -" helps here)
6. Run "rbt post" while at your branch to create a review request.
7. open the review request in your browser and "publish" it.
  - alternately use the rbt --open (-o) and/or --publish (-p) flags.
8. add a comment to the PR with a link to the review request.
9. address reviews until you get a "Ship It!" (like normal, with LGTM).
10. add a $$merge$$ comment to the PR (like normal).

This is a bit confusing and inconvenient. I can see us all forgetting to
keep our personal master branches on GH updated.

It looks like the TRACKING_BRANCH option in .reviewboardrc could be
helpful. It defaults to "origin/master" but if we changed it to
"upstream/master" I suspect Reviewboard will then generate diffs against
the shared master branch instead of what we might happen to have in master
in our personal forks. The of course relies on every developer having a
remote called "upstream" that points to the right place (which isn't
necessarily true).

If TRACKING_BRANCH isn't going to work then whatever automation we come up
with to streamline RB/GH integration is probably going to have to sort this
out.

- Menno

On 14 September 2014 14:45, Eric Snow  wrote:

> Hi all,
>
> Starting now new code review requests should be made on
> http://reviews.vapour.ws (see more below on creating review requests).
> We will continue to use github for everything else (pull requests,
> merging, etc.).  I've already posted some of the below information
> elsewhere, but am repeating it here for the sake of reference.  I plan
> on updating CONTRIBUTING.md with this information in the near future.
> Please let me know if you have any feedback.  Happy reviewing!
>
> -eric
>
> Authentication
> --
> Use the Github OAuth button on the login page to log in.  If you don't
> have an account yet on ReviewBoard, your github account name will
> automatically be registered for you.  ReviewBoard uses session
> cookies, so once you have logged in you shouldn't need to log in again
> unless you log out first.
>
> For the reviewboard commandline client (rbt), use your reviewboard
> username and a password of "oauth:@github".  This should
> only be needed the first time.
>
> RBTools
> --
>
> ReviewBoard has a command-line tool that you can install on your local
> system called "rbt".  rbt is the recommended tool for creating and
> updating review requestsion.  The documentation covers installation
> and usage.  It has satisfied my questions thus far.
>
> https://www.reviewboard.org/docs/rbtools/0.6/
>
> The key sub-command is "post" (see rbt post -h).
>
> To install you can follow the instructions in the rbtools docs.  You
> can also install using pip (which itself may need to be installed
> first):
>
> $ virtualenv ~/.venvs/reviewboard ~/.venvs/reviewboard/bin/pip install
> --allow-unverified rbtools --allow-external rbtools rbtools
> $ alias rbt='~/.venvs/reviewboard/bin/rbt'
>
> (you could just "sudo pip install" it, but the --allow-unverified flag
> makes it kind of sketchy.)
>
> Workflow
> ---
>
> 1. Create a pull request like normal via github.
> 2. Run "rbt pull" while at your branch to create a review request.
>   - if the repo does not have a .reviewboardrc file yet, you'll need
> to run "rbt setup-repo".
>   - make sure your branch is based on an up-to-date master.
>   - if the revision already has a review request you will need to
> update it (see below).
> 3. open the review request in your browser and "publish" it.
>   - alternately use the rbt --open (-o) and/or --publish (-p) flags.
> 4. add a comment to the PR with a link to the review request.
> 5. address reviews until you get a "Ship It!" (like normal, with LGTM).
> 6. add a $$merge$$ comment to the PR (like normal).
>
> Keep in mind that the github-related steps aren't strictly necessary
> for the sake a getti

Re: ReviewBoard is now the official review tool for juju

2014-09-15 Thread Menno Smits

>
> FWIW, I have the following in $GOPATH/src/github.com/juju/juju/.git/config
> :
>
> [remote "origin"]
> url = g...@github.com:ericsnowcurrently/juju.git
> fetch = +refs/heads/*:refs/remotes/origin/*
> [remote "upstream"]
> url = https://github.com/juju/juju.git
> fetch = +refs/heads/*:refs/remotes/upstream/*
>

I have this too. I'm pretty sure this was encouraged when we switched to
Github.

Has anyone not got origin and upstream set up like this? Another way to
check is to run:

git remote -v | grep -e upstream -e origin


> and in ~/.reviewboardrc:
>
> TRACKING_BRANCH = "upstream/master"
>
> This has worked fine for me (once I realized master had to be up-to-date).
>

This sorts out the issue for me too, avoiding the need to worry about
keeping your personal master branch on GH updated.

If we establish that everyone has upstream set up as above then we should
update the checked-in .reviewboardrc files for each repo.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: ReviewBoard is now the official review tool for juju

2014-09-15 Thread Menno Smits

On 16 September 2014 08:39, Ian Booth  wrote:

>
>
> On 16/09/14 00:50, Eric Snow wrote:
>
>
> > Step (0) is also pretty easy and I'll argue that people should be
> > doing it anyway.
> >
>
> Disagree :-)
> I never (or very, very rarely) had to do this with Launchpad and bzr and
> things
> Just Worked. I don't do it now with github and pull requests. I'd like to
> think
> we'd be able to avoid the burden moving forward also.
>

Sorry, I didn't mean for this to turn into a rebase vs merge discussion. A
different choice of wording would have helped. The first step could have
been written like:

0.  Sync up your feature branch with upstream (by merging or rebasing)

Some people like rebasing and some like merging. It doesn't matter much to
the rest of the team which approach you use but I presume that everyone
syncs up their branch somehow soon before proposing (rerunning tests etc)
to ensure that other people's changes haven't impacted theirs.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Review Board keyboard shortcuts

2014-09-16 Thread Menno Smits

Thumper was asking earlier if RB had keyboard shortcuts and it turns it out
it does:

https://www.reviewboard.org/docs/manual/dev/users/reviews/reviewing-diffs/#keyboard-shortcuts

Unfortunately the previous and next comment bindings don't seem to work for
me but the others do. Is it just me?
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Someone please look at these bugs ASAP

2014-09-17 Thread Menno Smits

On 18 September 2014 08:16, Nate Finch  wrote:

>
>
>
> *Unable to connect to environment after local upgrade on precise*
> https://bugs.launchpad.net/juju-core/+bug/1370635
>

At first glance, this one seems like it could be related to recent
multi-env state server changes. I'll take a look soon (thumper is out
today).
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Using subdocument _id fields for multi-environment support

2014-09-30 Thread Menno Smits

Team Onyx has been busy preparing for multi-environment state server
support. One piece of this is updating almost all of Juju's collections to
include the environment UUID in document identifiers so that data for
multiple environments can co-exist in the same collection even when they
otherwise have same identifier (machine id, service name, unit name etc).

Based on discussions on juju-dev a while back[1] we have started this doing
this by prepending the environment UUID to the _id field and adding extra
fields which provide the environment UUID and old _id value separately for
easier querying and handling.

So far, services and units have been migrated. Where previously a service
document looked like this:

type serviceDoc struct {
 Name  string `bson:"_id"`
 Seriesstring
 ...

it nows looks like this:

type serviceDoc struct {
 DocID string `bson:"_id"`   // ":wordpress/0"
 Name  string `bson:"name"`  // "wordpress/0"
 EnvUUID   string `bson:"env-uuid"`  // ""
 Seriesstring
 ...

Unit documents have undergone a similar transformation.

This approach works but has a few downsides:

   - it's possible for the local id ("Name" in this case) and EnvUUID
   fields to become out of sync with the corresponding values the make up the
   _id. If that ever happens very bad things could occur.
   - it somewhat unnecessarily increases the document size, requiring that
   we effectively store some values twice
   - it requires slightly awkward transformations between UUID prefixed and
   unprefixed IDs throughout the code

MongoDB allows the _id field to be a subdocument so Tim asked me to
experiment with this to see if it might be a cleaner way to approach the
multi-environment conversion before we update any more collections. The
code for these experiments can be found here:
https://gist.github.com/mjs/2959bb3e90a8d4e7db50 (I've included the output
as a comment on the gist).

What I've found suggests that using a subdocument for the _id is a better
way forward. This approach means that each field value is only stored once
so there's no chance of the document key being out of sync with other
fields and there's no unnecessary redundancy in the amount of data being
stored. The fields in the _id subdocument are easy to access individually
and can be queried separately if required. It is also possible to create
indexes on specific fields in the _id subdocument if necessary for
performance reasons.

Using this approach, a service document would end up looking something like
this:

type serviceDoc struct {
 IDserviceId `bson:"_id"`
 Seriesstring
 ...
}

type serviceId struct {
  EnvUUID string `bson:"env-uuid"`
  Namestring
}

There was some concern in the original email thread about whether
subdocument style _id fields would work with sharding. My research and
experiments suggest that there is no issue here. There are a few types of
indexes that can't be used with sharding, primarily "multikey" indexes, but
I can't see us using these for _id values. A multikey index is used by
MongoDB when a field used as part of an index is an array - it's highly
unlikely that we're going to use arrays in _id fields.

Hashed indexes are a good basis for well-balanced shards according to the
MongoDB docs so I wanted to be sure that it's OK to create a hashed index
for subdocument style fields. It turns out there's no issue here (see
TestHashedIndex in the gist).

Using subdocuments for _id fields is not going to prevent us from using
MongoDB's sharding features in the future if we need to.

Apart from having to rework the changes already made to the services and
units collections[2], I don't see any downsides to this approach. Can
anyone think of something I might be overlooking?

- Menno


[1] - subject was "RFC: mongo "_id" fields in the multi-environment juju
server world"

[2] - this work will have to be done before 1.21 has a stable release
because the units and services changes have already landed.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Using subdocument _id fields for multi-environment support

2014-10-01 Thread Menno Smits

On 2 October 2014 01:31, Kapil Thangavelu 
wrote:

> it feels a little strange to use a mutable object for an immutable field.
> that said it does seem functional. although the immutability speaks to the
> first disadvantage noted for the separate fields namely becoming out of
> sync, which afaics isn't something that's possible with the current model,
> ie. a change of name needs to generate a new doc. Names (previous _id) are
> unique in usage minus the extant bug that unit ids are reused. even without
> that the benefits to avoiding the duplicate doc data and manual parse on
> every _id seem like clear wins for subdoc _ids.
>

Just to be really sure, I added a test that exercises the case of one of
the _id fields changing. See TestAttemptedIdUpdate in the (just updated)
gist. MongoDB stops us from doing anything stupid (as expected).
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Using subdocument _id fields for multi-environment support

2014-10-10 Thread Menno Smits

TL;DR: It has been surprisingly difficult As per what has already been done
for the units and services collections, we will continue with the approach
of using ":" style string ids while also adding separate env-UUID
and collection specific identifier fields.

Jesse and I have been making the changes to use subdocument ids for the
units and services collections this week at the sprint and we've come up
against some unexpected issues.

We found that one part of the mgo/txn package wasn't happy using struct ids
and have been working with Gustavo to fix that. This isn't a show-stopper
but has slowed us down.

We also found unexpected friction with the implementation of the watchers
and entity life. These areas deeply assume that our document ids are
strings and fixing them requires wide-ranging and often ugly changes which
will take significant time to get right. It's been brick wall after brick
wall. We discussed with Tim, Will, John and Ian yesterday and given that
it's important that multi-environment support lands soon and given that the
watchers are going to completely change in the not too distant future[1],
we have abandoned the approach of using subdocument idfs for
multi-environment support. The benefits of using subdocuments ids are
outweighed by the chan

- Menno

[1] opening up the possibility of surrogate keys as document ids, where we
need application domain fields to exist fields outside of the _id.

On 1 October 2014 22:11, Menno Smits  wrote:

>
>
> On 2 October 2014 01:31, Kapil Thangavelu 
> wrote:
>
>> it feels a little strange to use a mutable object for an immutable field.
>> that said it does seem functional. although the immutability speaks to the
>> first disadvantage noted for the separate fields namely becoming out of
>> sync, which afaics isn't something that's possible with the current model,
>> ie. a change of name needs to generate a new doc. Names (previous _id) are
>> unique in usage minus the extant bug that unit ids are reused. even without
>> that the benefits to avoiding the duplicate doc data and manual parse on
>> every _id seem like clear wins for subdoc _ids.
>>
>
> Just to be really sure, I added a test that exercises the case of one of
> the _id fields changing. See TestAttemptedIdUpdate in the (just updated)
> gist. MongoDB stops us from doing anything stupid (as expected).
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Using subdocument _id fields for multi-environment support

2014-10-10 Thread Menno Smits

(doh...hit send accidentally)

TL;DR: It has been surprisingly difficult to use subdocument ids within
Juju and we are abandoning this approach. As per what has already been done
for the units and services collections, we will continue with the approach
of using ":" style string ids while also adding separate env-UUID
and collection specific identifier fields.

Jesse and I have been making the changes to use subdocument ids for the
units and services collections this week at the sprint and we've come up
against some unexpected issues.

We found that one part of the mgo/txn package wasn't happy using struct ids
and have been working with Gustavo to fix that. This isn't a show-stopper
but has slowed us down.

We also found unexpected friction with the implementation of the watchers
and entity life. These areas deeply assume that our document ids are
strings and fixing them requires wide-ranging and often ugly changes which
will take significant time to get right. It's been brick wall after brick
wall. We discussed with Tim, Will, John and Ian yesterday and given that
it's important that multi-environment support lands soon and given that the
watchers are going to completely change in the not too distant future[1],
we have abandoned the approach of using subdocument ids for
multi-environment support. The benefits of using subdocument ids are
outweighed by the amount of change required to get there.

- Menno

[1] opening up the possibility of surrogate keys as document ids, where we
need application domain fields to exist fields outside of the _id.

On 10 October 2014 10:09, Menno Smits  wrote:

> TL;DR: It has been surprisingly difficult As per what has already been
> done for the units and services collections, we will continue with the
> approach of using ":" style string ids while also adding separate
> env-UUID and collection specific identifier fields.
>
> Jesse and I have been making the changes to use subdocument ids for the
> units and services collections this week at the sprint and we've come up
> against some unexpected issues.
>
> We found that one part of the mgo/txn package wasn't happy using struct
> ids and have been working with Gustavo to fix that. This isn't a
> show-stopper but has slowed us down.
>
> We also found unexpected friction with the implementation of the watchers
> and entity life. These areas deeply assume that our document ids are
> strings and fixing them requires wide-ranging and often ugly changes which
> will take significant time to get right. It's been brick wall after brick
> wall. We discussed with Tim, Will, John and Ian yesterday and given that
> it's important that multi-environment support lands soon and given that the
> watchers are going to completely change in the not too distant future[1],
> we have abandoned the approach of using subdocument idfs for
> multi-environment support. The benefits of using subdocuments ids are
> outweighed by the chan
>
>
>
> - Menno
>
> [1] opening up the possibility of surrogate keys as document ids, where we
> need application domain fields to exist fields outside of the _id.
>
>
> On 1 October 2014 22:11, Menno Smits  wrote:
>
>>
>>
>> On 2 October 2014 01:31, Kapil Thangavelu > > wrote:
>>
>>> it feels a little strange to use a mutable object for an immutable
>>> field. that said it does seem functional. although the immutability speaks
>>> to the first disadvantage noted for the separate fields namely becoming out
>>> of sync, which afaics isn't something that's possible with the current
>>> model, ie. a change of name needs to generate a new doc. Names (previous
>>> _id) are unique in usage minus the extant bug that unit ids are reused.
>>> even without that the benefits to avoiding the duplicate doc data and
>>> manual parse on every _id seem like clear wins for subdoc _ids.
>>>
>>
>> Just to be really sure, I added a test that exercises the case of one of
>> the _id fields changing. See TestAttemptedIdUpdate in the (just updated)
>> gist. MongoDB stops us from doing anything stupid (as expected).
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: review board not syncing?

2014-11-11 Thread Menno Smits

Since I'm on-call reviewing today and don't know how to debug the problem,
I will watch both Github and RB for updates. For branches that don't make
it to RB, I'll review on GH.

On 12 November 2014 08:28, Jesse Meek  wrote:

> The latest three reviews on GitHub (#1103,#1102,#1101) I cannot see in
> Review Board. Do we have a loose wire?
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Automatic environment filtering for DB queries

2014-11-12 Thread Menno Smits

Team Onyx has been busy preparing the MongoDB collections used by Juju to
support data for multiple environments. For each collection that needs to
store data for multiple environments we have been adding the a "env-uuid"
field to each document and prefixing the environment UUID to the document
id, with the previous document id being moved to a new field. Apart from
the document changes themselves, we've been adjusting the state package
implementation to match the document changes.

Part of this task is ensuring that all DB queries correctly filter by
environment so that we don't end up unintentionally leaking data across
environments. To avoid opportunities for us to forget to add environment
filtering to the DB queries using by Juju, sabdfl would like us to consider
ways to make this filtering happen automatically. To this end, I propose we
add the following methods:

func (s *State) Find(coll *mgo.Collection, sel bson.D) *mgo.Query
func (s *State) FindId(coll *mgo.Collection, id string) *mgo.Query


The idea is that almost all MongoDB queries performed by Juju would use
these 2 methods. They would become the default way that we do queries, used
even on collections that don't contain data for multiple environments.

Both methods would check the collection passed in against a fixed set of
collections that use environment UUIDs. If the collection doesn't use
environment UUIDs then the lookup is passed through to mgo unmodified. If
the collection *does* use environment UUIDs then Find() would automatically
add the appropriate "env-uuid" field to the query selector.  Similarly,
FindId() would automatically call docID() on the supplied id. Pretty simple.

If use of these methods becomes default way the team does DB queries in
Juju code, then we greatly reduce the risk of leaking data between
environments. They also allows us to remove one concern from each
Find/FindId call site - as environment filtering is taken care of by these
methods, it does not having to be repeated all throughout the codebase. To
get us started, I intend to perform a mass-refactoring of all existing Find
and FindId calls to use these new methods.

To make the proposal clearer, here's some examples:

Find call:   err := units.Find(bson.D{{"env-uuid": ...}, {"service":
service}}).All{&docs)
  becomes:   err := st.Find(units, bson.D{{"service": service}}).All{&docs)

FindId call: err = units.FindId(w.st.docID(unitName)).One(&doc)
becomes: err = w.st.FindId(units, unitName).One(&doc)

Does this sound reasonable? Is there another approach I should be
considering?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Automatic environment filtering for DB queries

2014-11-18 Thread Menno Smits

Thanks John and Tim. I really like these ideas, especially because it means
the team doesn't need to learn a new way of working (and remember to keep
using the new way). In most cases, the thing returned by getCollection()
will be able to be used in the same way as before, even though it'll
actually be a different type of thing. I'll try this approach out in the
next day or so.

I have thought about what can be done about ensuring env UUIDs are
correctly added or updated during inserts and updates but I don't think
there's anything practical we can do there. I think we need to rely on
diligence and testing to ensure that writes to the DB correctly handle
environment UUIDs.



On 18 November 2014 17:03, Tim Penhey  wrote:

> Wrapping the collection objects that are returned from the
> `getCollection` method shouldn't be too hard and it could wrap the Find
> and FindId calls.  It would have fixed this missed call below:
>
> // aliveUnitsCount returns the number a alive units for the service.
> func aliveUnitsCount(service *Service) (int, error) {
> units, closer := service.st.getCollection(unitsC)
> defer closer()
>
> query := bson.D{{"service", service.doc.Name}, {"life", Alive}}
> return units.Find(query).Count()
> }
>
> However it is not just finding that we need to care about, but setting
> and updating the collections.  Hopefully testing would cover the cases
> where we aren't setting what we think we are setting, but that is much
> harder to catch as the main execution flow is to just "run these
> transaction operations".
>
> Tim
>
>
> On 18/11/14 16:45, John Meinel wrote:
> > I've had this around to think about for a bit. I think it is ok, though
> > sniffing an input parameter to change behavior seems brittle. Mostly
> > because the object isn't really design to work that way. Could we wrap
> > the objects so that we just have Find/FindId do the right thing to start?
> >
> > I suppose that is a fair amount of more work. I certainly do like the
> > idea of having a common pattern rather than requiring everyone to know
> > exactly whether this object is different than the rest.
> >
> > John
> > =:->
> >
> >
> > On Thu, Nov 13, 2014 at 8:10 AM, Menno Smits  > <mailto:menno.sm...@canonical.com>> wrote:
> >
> > Team Onyx has been busy preparing the MongoDB collections used by
> > Juju to support data for multiple environments. For each collection
> > that needs to store data for multiple environments we have been
> > adding the a "env-uuid" field to each document and prefixing the
> > environment UUID to the document id, with the previous document id
> > being moved to a new field. Apart from the document changes
> > themselves, we've been adjusting the state package implementation to
> > match the document changes.
> >
> > Part of this task is ensuring that all DB queries correctly filter
> > by environment so that we don't end up unintentionally leaking data
> > across environments. To avoid opportunities for us to forget to add
> > environment filtering to the DB queries using by Juju, sabdfl would
> > like us to consider ways to make this filtering happen
> > automatically. To this end, I propose we add the following methods:
> >
> > func (s *State) Find(coll *mgo.Collection, sel bson.D) *mgo.Query
> > func (s *State) FindId(coll *mgo.Collection, id string)
> *mgo.Query
> >
> >
> > The idea is that almost all MongoDB queries performed by Juju would
> > use these 2 methods. They would become the default way that we do
> > queries, used even on collections that don't contain data for
> > multiple environments.
> >
> > Both methods would check the collection passed in against a fixed
> > set of collections that use environment UUIDs. If the collection
> > doesn't use environment UUIDs then the lookup is passed through to
> > mgo unmodified. If the collection /does/ use environment UUIDs then
> > Find() would automatically add the appropriate "env-uuid" field to
> > the query selector.  Similarly, FindId() would automatically call
> > docID() on the supplied id. Pretty simple.
> >
> > If use of these methods becomes default way the team does DB queries
> > in Juju code, then we greatly reduce the risk of leaking data
> > between environments. They also allows us to remove one concern from
> > each Find/FindId call site - as environment

Re: Also found all-machines.log issue with 1.21

2014-11-26 Thread Menno Smits

On 27 November 2014 at 10:52, Tim Penhey  wrote:

> https://bugs.launchpad.net/juju-core/+bug/1396796
>
> Would love other people to check.
>
> I don't know if it happens on other providers, just using local right now.
>
> I would appreciate it if someone could confirm working or not working on
> EC2.
>

I've checked on EC2 and the problem doesn't appear to happen there, both
when upgrading or starting with fresh install. Seems like this is a local
only issue.

I've been digging in to this problem with the local provider but don't have
an answer yet, just theories.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Also found all-machines.log issue with 1.21

2014-11-26 Thread Menno Smits

The fix for this is merging into master and 1.21 now.

The problem was that for local provider based environments, unit and
machine agents weren't including the required namespace in their log
message tags. rsyslogd on the server was expecting to see the namespace so
it filtered out the unexpected messages. I think this regression was
probably introduced when we started using a Go syslog client on non-state
server machines instead of rsyslogd (so that we can have non-Linux
machines).

On 27 November 2014 at 14:23, Menno Smits  wrote:

>
>
> On 27 November 2014 at 10:52, Tim Penhey  wrote:
>
>> https://bugs.launchpad.net/juju-core/+bug/1396796
>>
>> Would love other people to check.
>>
>> I don't know if it happens on other providers, just using local right now.
>>
>> I would appreciate it if someone could confirm working or not working on
>> EC2.
>>
>
> I've checked on EC2 and the problem doesn't appear to happen there, both
> when upgrading or starting with fresh install. Seems like this is a local
> only issue.
>
> I've been digging in to this problem with the local provider but don't
> have an answer yet, just theories.
>
> - Menno
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Joyent networking issues

2014-12-11 Thread Menno Smits

For the last day and a half I've been looking at this bug:
https://bugs.launchpad.net/juju-core/+bug/1401130

There's a lot of detail attached to the ticket but the short story is that
the Joyent cloud often allocates different internal networks to instances,
meaning that they can't communicate. From what I can tell from relevant LP
tickets, this has been a problem for a long time (perhaps always). It's
very hit and miss - sometimes you get allocated 10 machines in a row that
all end up with the same internal network, but more often than not it only
takes 2 or 3 machine additions before running into one that can't talk to
the others.

I have found a forum post where someone from Joyent suggests adding a
static route for 10.0.0.0/8 to force all internal traffic down the internal
network interface. I've tried this out and it does indeed work. We *could*
have cloud-init install such a static route as new instances are configured
but that's a pretty gross hack that hardcodes an assumption in Juju about
Joyent's network setup which will no doubt bite us down the track.

Another possible workaround could be to have machines on Joyent communicate
via their public addresses, ignoring the internal network. I'm not sure how
hard this is.

Andrew has played around with the Joyent API and curiously the ListNetworks
API returns different networks to those that actually get assigned to the
instances. I hacked up the Joyent provisioner to use these networks but
that didn't seem to help.

I have opened a support ticket with Joyent to get clarification (no
response yet).

Given that this is looking like a problem/feature at Joyent's end that
needs clarification from them, may I suggest that this issue is no longer
allowed to block CI?

If there's other ideas about what's going on here, please speak up.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Joyent networking issues

2014-12-14 Thread Menno Smits

On 15 December 2014 at 01:18, John Meinel  wrote:
>
> That sounds like you're just excluding the entire 10.0.* range from going
> via the Gateway, which is fine, but then why isn't the subnet mask 10.0/16
> in the first place ? Or maybe it even needs to be 10.0.0.0/8 ?
>

The internal IPs and netmasks being used by instances are assigned by
Joyent. Juju isn't deciding on the netmasks - Joyent is assigns various
10.x.x.x/21 networks. When 2 machines end up on different internal networks
then traffic destined for the other networks goes out the public interface
and gets dropped at the next hop (probably by anti-spoofing configuration
on a router/firewall).

> Probably the big concern for something like 10.0.0.0/8 would be if/when
> we do overlay networks and then there are separate 10.? networks that
> shouldn't be routed the same.
>

Agreed that this is a concern but at least if a single 10/8 route is added,
any more specific routes for 10.x.x.x that also get added for overlay
networks will take precedence (Linux uses the most specific route). Not
ideal though.

Joyent support has gotten back to me and have repeated what I already found
in that forum post: that a static route should be added. They also mention:
"this is a known bug in previous platform images(the underlying cloudOS).
The operations team is working to update the impacted images, but the
solution is to add route statements to allow access to the respective
VLANS."  I presume the "bug" as far as they're concerned is that the routes
aren't added automatically.

After discussing with Tim, I'm going to make a change to have cloud-init
create the static route for Joyent deployments, with a ticket to track the
fact that this hack is in place.

- Menno

>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Joyent networking issues

2014-12-14 Thread Menno Smits

On 13 December 2014 at 06:34, Curtis Hovey-Canonical 
wrote:
>
> Thank you Menno.
>
> On Fri, Dec 12, 2014 at 12:01 AM, Menno Smits 
> wrote:
> > For the last day and a half I've been looking at this bug:
> > https://bugs.launchpad.net/juju-core/+bug/1401130
> >
> > There's a lot of detail attached to the ticket but the short story is
> that
> > the Joyent cloud often allocates different internal networks to
> instances,
> > meaning that they can't communicate. From what I can tell from relevant
> LP
> > tickets, this has been a problem for a long time (perhaps always). It's
> very
> > hit and miss - sometimes you get allocated 10 machines in a row that all
> end
> > up with the same internal network, but more often than not it only takes
> 2
> > or 3 machine additions before running into one that can't talk to the
> > others.
>
> Your analysis explains a lot about the the intermittent failures we
> have observed in Juju CI for months.
> ...
>
> > Given that this is looking like a problem/feature at Joyent's end that
> needs
> > clarification from them, may I suggest that this issue is no longer
> allowed
> > to block CI?
>
> Speaking for users, there is a regression.
>
> ...


>
> We do see intermittent failures using 1.20 in the joyent cloud health
> check. so we know statistically, the problem does exists for every
> juju, but we are seeing 100% failure for master tip. The success rates
> were better for master last week, and the rates for 1.20 and 1.21 are
> great for all weeks.
>

Based on what we're seeing in CI, I'm thinking there are 3 things at play
here:

1. The new networker wasn't playing well with the way the network
configuration files are set up in Joyent images. Dimiter has disabled the
networker on Joyent for now, increasing the chance of success for 1.21 and
master.

2. As discussed throughout this thread, instances can end up on different
internal networks. This is a Joyent issue which can affect any Juju
release. It's just up to chance whether the tests will pass on Joyent in CI
- if one of  instances that is assigned isn't on the same internal network
as the others the test run will fail. Adding a static route for 10.0.0.0/8
should fix this.

3. Some other issue, yet to be determined, is preventing the Joyent tests
from passing on master only. I will start investigating this once the
static route is being added automatically.



>
>
> --
> Curtis Hovey
> Canonical Cloud Development and Operations
> http://launchpad.net/~sinzui
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Joyent networking issues

2014-12-14 Thread Menno Smits

>
>
> Based on what we're seeing in CI, I'm thinking there are 3 things at play
> here:
>
> 1. The new networker wasn't playing well with the way the network
> configuration files are set up in Joyent images. Dimiter has disabled the
> networker on Joyent for now, increasing the chance of success for 1.21 and
> master.
>
> 2. As discussed throughout this thread, instances can end up on different
> internal networks. This is a Joyent issue which can affect any Juju
> release. It's just up to chance whether the tests will pass on Joyent in CI
> - if one of  instances that is assigned isn't on the same internal network
> as the others the test run will fail. Adding a static route for 10.0.0.0/8
> should fix this.
>

The merge for this just completed.


> 3. Some other issue, yet to be determined, is preventing the Joyent tests
> from passing on master only. I will start investigating this once the
> static route is being added automatically.
>

Looking more closely at the joyent-deploy-trusty-amd64 job you can see that
there is actually 2 reasons for it failing. Sometimes bootstrap fails and
that is clearly because of the routing issue (#2 above, should now be
fixed).

Sometimes bootstrap succeeds, but the "juju deploy ... local:dummy-source"
command fails with a non-zero exit. Based on the log output it looks like
the API connection gets terminated unexpectedly. This might indicate a
panic while handling the API request. Unfortunately the logs that are saved
with the test failure stop well before bootstrap has completed so we have
little to go off.

I haven't been able to make the problem happen when manually copying what
the test does from my own machine. This might require use of the actual
test CI infrastructure to reproduce.

I've created another ticket to track this issue:
https://bugs.launchpad.net/juju-core/+bug/1402495

I'm guessing CI will remain blocked until until this is fixed so someone
will need to continue with this since I'm about to EOD.

- Menno



>
>
>>
>>
>> --
>> Curtis Hovey
>> Canonical Cloud Development and Operations
>> http://launchpad.net/~sinzui
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Blocking and unblocking merges

2014-12-17 Thread Menno Smits

There seems to be some confusion as to how merges gets blocked and
unblocked. Specifically:

1. How do merges get blocked?

2. Who/what decides to unblock merges?

3. How do merges get unblocked?

4, If the unblock process involves manual steps, whose responsibility is it
to perform those steps?

Based on experience and observation, I think I know how at least some of
this works but could we please have some authoritative answers?

Thanks,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Automatic multi-environment collection handling

2014-12-17 Thread Menno Smits

I've landed several big changes recently which automate handling of
multi-environment concerns when accessing Juju's collections. These both
simplify DB queries and updates as well as reducing the risk of unintended
data leakage between environments. Although in most cases you won't even
know that anything has changed, it's worth understanding what's been done.

*Collections*

MongoDB queries against collections which contain data for multiple
environments are now automatically modified to ensure they return records
for only the environment tied to State being queried against. Queries
against collections which do not contain data for multiple environments
pass through untouched.

Some examples ...

machines.FindId("2")
becomes
machines.FindId(":2").One(&doc)

machines.Find(bson.D{{"series", "trusty"}}}
becomes
machines.Find(bson.D{{"series", "trusty"}, {"env-uuid", ""}})

machines.Find(bson.D{{"_id", "4"}}}
becomes
machines.Find(bson.D{{"_id", ":4"}, {"env-uuid", ""}})

Where "" is the environment UUID of the State instance the collection
was obtained from (using getCollection()).

The Remove, RemoveId and RemoveAll methods on collections also have similar
handling and the collection Count method returns only the number of records
in the collection for a single environment.

The main benefit of this is that you don't need to remember to wrap ids in
State.docID() calls or remember to add the "env-uuid" field to queries. In
fact, I recommend you leave them out to reduce noise from code that does DB
queries.

There are some limited cases where you might really need to query across
multiple environments or don't want the automatic munging in place for some
reason. For these scenarios you can get hold of a *mgo.Collection by
calling State.getRawCollection(). This is currently only being used by a
few database migration steps.

Note that query selectors using MongoDB operators with the _id field will
be left untouched. In these cases you need to know that there's a UUID
prefix on the _id and handle it yourself. For example, to query all the
machines with ids starting with "4" you might consider doing:

machines.Find(bson.D{{"_id", bson.D{"$regex": "^4.*"
which is transformed to:
machines.Find(bson.D{{"_id", bson.D{"$regex": "^4.*", {"env-uuid",
""}})

Note how the _id selector is left alone but the env-uuid selector is still
added. It's left up to the developer to account for the environment UUID in
_id regex (the regex above won't work as is).


*Transactions*

Changes have also been made for automatically modifying transaction
operations to account for multi-environment collections.

For example:

st.runTransaction([]txn.Op{{
C: machinesC,
Id: "1"
Remove: true,
}, {
C: machinesC,
Id: "2",
Insert: bson.D{
{"series", "trusty"},
},
}, {
C: machinesC,
Id: "3",
Insert: &machineDoc{
Series: "trusty",
},
}, {
C: otherC,
Id: "foo",
Insert: bson.D{},
}})

automatically becomes:

st.runTransaction([]txn.Op{{
C: machinesC,
Id: ":1",
Remove: true,
}, {
C: machinesC,
Id: ":2",
Insert: bson.D{
{"_id", ":2"},
{"env-uuid", ""},
{"series", "trusty"},
},
}, {
C: machinesC,
Id: ":3",
Insert: &machineDoc{
DocID: ":3",
EnvUUID: "",
Series: "trusty",
}
}, {
C: otherC,
Id: "foo",
Insert: bson.D{},
}})

Note how the environment UUID is prefixed onto ids for operations for
multi-environment collections. Also see how the _id and env-uuid field on
documents defined using bson.D or structs (bson.M supported too) are
automatically populated. A panic will occur if you provide the environment
UUID but it doesn't match what was expected as this indicates a likely bug.

Any document updates are made in place so that the caller sees them once
the transaction completes. This makes it safe for the caller to a document
struct used with a transaction operation for further work - the struct will
match what was written to the DB. Note that if a struct is passed by value
and needs updating, a panic will occur. This won't normally be a problem as
we tend to use pointers to document structs with transaction operations,
and the panic is a helpful indication that the document provided isn't
multi-environment safe.

Note that only the Id and Insert fields of txn.Op are touched. The Update
and Assert fields are left alone.

In some cases you may need to run a transaction without invoking automatic
multi-environment munging. State now has a rawTxnRunner() and
runRawTransaction() methods for the rare situations where this is
necessary. Please use these sparingly.

*Performance*

With the extra work being done to implement automatic multi-environment
support, the performance impact was a concern. I have compared multiple
runs of the state package unit tests, with and without these changes and
the difference is lost in the noise.

If you see any problems or ha

Re: Automatic multi-environment collection handling

2014-12-18 Thread Menno Smits

On 19 December 2014 at 06:02, Dimiter Naydenov <
dimiter.nayde...@canonical.com> wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> All this is great work! Thanks for the write-up as well.
>
> I think I discovered an issue with it - take a look at this bug
> http://pad.lv/1403738. It seems machine.SetAgentVersion() should be
> handled specially by the multi-env transaction runner, as only after
> calling it the upgrade actually starts and the steps to add env-uuids
> to state collections are executed.

Sorry - I neglected to do a manual upgrade test before pushing this change.
Ensuring that code that runs before database migrations have occurred still
works as it should has been pain point for us while doing the
multi-environment work.

I will get this sorted. Thanks for saving me some time by doing the initial
analysis.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Automatic multi-environment collection handling

2014-12-18 Thread Menno Smits

Just following up. This was fixed earlier on today and the various CI
upgrade jobs are now passing. I've marked the ticket as Fix Released so
that this issue no longer blocks merges.

On 19 December 2014 at 09:40, Menno Smits  wrote:
>
> On 19 December 2014 at 06:02, Dimiter Naydenov <
> dimiter.nayde...@canonical.com> wrote:
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> All this is great work! Thanks for the write-up as well.
>>
>> I think I discovered an issue with it - take a look at this bug
>> http://pad.lv/1403738. It seems machine.SetAgentVersion() should be
>> handled specially by the multi-env transaction runner, as only after
>> calling it the upgrade actually starts and the steps to add env-uuids
>> to state collections are executed.
>
>
> Sorry - I neglected to do a manual upgrade test before pushing this
> change. Ensuring that code that runs before database migrations have
> occurred still works as it should has been pain point for us while doing
> the multi-environment work.
>
> I will get this sorted. Thanks for saving me some time by doing the
> initial analysis.
>
> - Menno
>
>
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: bugs, fixes and targeting Juju versions

2015-05-04 Thread Menno Smits

 cherry-pick will even grab the top commit of a branch if you give the
branch name (presuming the fix is a single commit). For example:

git checkout -b bug-fix-1.24 upstream/1.24   # create a branch for the fix
in 1.24
git cherry-pick bug-fix-master-branch   # pull the fix across

There are various ways of grabbing multiple revisions too.

And of course, as per Ian's recent email you should be targeting fixes to
the lowest affected version and working forwards. So really in your example
the fix should be made for 1.24 and the cherry picked onto a branch made
from master.




On 5 May 2015 at 13:15, Tim Penhey  wrote:

> git cherry-pick does this as a git command.
>
> Tim
>
>
> On 05/05/15 13:03, Jesse Meek wrote:
> > Hi All,
> >
> > tl;dr `git diff --no-prefix master > diff.patch; patch -p0 < diff.patch`
> > is useful for landing bug fixes in different versions of juju.
> >
> > As a lot of us are currently bug hunting and needing to land fixes in
> > multiple versions of Juju, I thought I'd share my process of doing that
> > (maybe it's helpful?):
> >
> > So say you've branched master, let's call it "bug-fix-master-branch",
> > it's got your fix but you need to land it in 1.24. So branch 1.24, let's
> > call it bug-fix-124, and do the following:
> >
> > # generate a diff of your changes that can be used with patch
> > (--no-prefix master is the magic flag that generates the right format)
> > (bug-fix-master-branch) $ git diff --no-prefix master > diff.patch
> >
> > # don't add or commit, checkout the other branch
> > (bug-fix-master-branch) $ git checkout bug-fix-124
> >
> > # diff.patch is still there, unstaged. So use it to add the patch
> > (bug-fix-124) $ patch -p0 < diff.patch
> >
> > # do a sanity check
> > (bug-fix-124) $ git diff
> >
> > # remove the patch file
> > (bug-fix-124) $ rm diff.patch
> >
> > You've now got a bug-fix branch eligible for automatic merging targeting
> > 1.24.
> >
> > Cheers,
> > Jess
> >
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Send Juju logs to different database?

2015-05-05 Thread Menno Smits

On 6 May 2015 at 06:49, Lauri Ojansivu  wrote:

> Hi,
> today at Ubuntu Developer Summit:
> http://summit.ubuntu.com/uos-1505/meeting/22437/juju-share-and-juju-sos/
>
> Speakers talked about changing all logging to go to MongoDB, so I asked
> question at IRC:
>  Would it be possible to use some different database than MongoDB
> for logs, because of current problems in MongoDB ?
> https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads
>

It is more likely that Juju will grow the ability to send logs to external
log services using the syslog protocol (and perhaps others). You could use
this to log to your own log aggregator or database. This feature has been
discussed but hasn't been planned in any detail yet (pull requests would be
most welcome!).

The changes currently being made to Juju to support logging to MongoDB also
lay the groundwork for logging to external services. This should land in
the coming weeks.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Send Juju logs to different database?

2015-05-06 Thread Menno Smits

On 6 May 2015 at 19:53, Stuart Bishop  wrote:

> On 6 May 2015 at 04:57, Menno Smits  wrote:
>
> > It is more likely that Juju will grow the ability to send logs to
> external
> > log services using the syslog protocol (and perhaps others). You could
> use
> > this to log to your own log aggregator or database. This feature has been
> > discussed but hasn't been planned in any detail yet (pull requests would
> be
> > most welcome!).
>
> syslog seems a bad fit, as the logs are now structured data and I'd
> like to keep it that way. I guess people want it as an option, but I'd
> consider it the legacy option here.
>

You're right. Structured logs would be much more useful.


> My own use case would be to make a more readable debug-logs, rather
> than attempting to parse the debug-logs output ;) Hmm... I may be able
> to do this already via the Juju API.
>

Not yet. The API used by debug-log currently emits formatted log lines, not
structured log data.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Send Juju logs to different database?

2015-05-06 Thread Menno Smits

On 7 May 2015 at 00:58, Charles Butler  wrote:

>
> On Wed, May 6, 2015 at 3:53 AM, Stuart Bishop  > wrote:
>
>> My own use case would be to make a more readable debug-logs, rather
>> than attempting to parse the debug-logs output ;) Hmm... I may be able
>> to do this already via the Juju API.
>>
>
> Personally - i feel like this is a *great* use case for an opengrok filter
> w/ logstash.  Translate that structured data into something meaningful in
> elasticsearch and search/aggregate/tail like a champ.
>

Agreed - that would be awesome.  When we get to this we should aim to have
Juju emit structured log data that could be used to feed things like
logstash. If we do it right then hopefully grok won't be needed because the
log data will already be structured.

No promises about when this might happen though. At least the logging work
currently being done provides a good basis to add logging to external
systems.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Pruning the txns collection

2015-05-13 Thread Menno Smits

On 14 May 2015 at 06:41, Gustavo Niemeyer 
wrote:

>
> You are right that it's not that simple, but it's not that complex either
> once you understand the background.
>
> Transactions are applied by the txn package by tagging each one of the
> documents that will participate in the transaction with the transaction id
> they are participating in. When mgo goes to apply a transaction in that
> same document, it will tag the document with the new transaction id, and
> then evaluate all the transactions it is part of. If you drop one of the
> transactions that a document claims to be participating in, then the txn
> package will rightfully complain since it cannot tell the state of a
> transaction that explicitly asked to be considered for the given document.
>
> That means the solution is to make sure removed transactions are 1) in a
> final state; and 2) not being referenced by any tagged documents.
>

Thanks. This explanation clarifies things a lot.


>
> The txn package itself collects garbage from old transactions as new
> transactions are applied, but it doesn't guarantee that right after a
> transaction reaches a final state it will be collected. This can lead to
> pretty old transactions being referenced, if these documents are never
> touched again.
>

I was confused by this part when I read it because I don't see anywhere in
the mgo/txn code where cleanup of the txn collection already occurs. To
summarise our later IRC conversation for anyone who might be interested:
mgo/txn doesn't currently prune the txns collection, but it *does* prune
references to applied transactions from the txn-queue fields on documents.


>
> So, you have two choices to collect these old documents:
>
> 1. Clean up the transaction references from all documents
>
> or
>
> 2. Just make sure the transaction being removed is not referenced anywhere
>
> I would personally go for 2, as it is a read-only operation everywhere but
> in the transactions collection itself, to drop the transaction document.
>

I agree that #2 is preferable and I have a fairly straightforward strategy
in mind to make this happen. I'll work on that today.


Note that the same rules here apply to the stash collection as well.
>

Noted. I know how this hangs together from my work with PurgeMissing.

Thanks,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: "fork/exec ... unable to allocate memory"

2015-06-03 Thread Menno Smits

This thread and the ticket linked by Michael got me curious about whether
we could write our own routine for spawning processes that doesn't invoke
the usual copy-on-write semantics.

The struct returned by exec.Command has a a SysProcAttr field where you can
set (Linux-specific) flags to pass to the clone syscall. The CLONE_VM flag
looks promising but it seems to upset Go when you use it (fatal error:
runtime: stack growth during syscall). If CLONE_VM | CLONE_VFORK is used
the executable runs - I see the output from echo - but the call to Run
never returns. I'm not sure why given that with CLONE_VFORK the parent
process is supposed to unblock once the child calls execve (which it does).

I could poke at this further but I need to get back to other things.
Looking at https://github.com/golang/go/issues/5838 I'm not the only one
who's tried this and run into similar problems.

Another approach - and one that Go might do internally one day - is to tell
the kernel to not allow copy-on-write for all (or at least large) memory
blocks. Tweaking the allocations in Nate's example like this:

bigs := make([][]byte, 6)
for i := range bigs {
bigs[i] = make([]byte, GB)
syscall.Madvise(bigs[i], syscall.MADV_DONTFORK)
}

Allows the fork to work. It fails as before without the Madvise calls. This
isn't particularly practical for us but it's an interesting data point
anyway.

- Menno




On 4 June 2015 at 02:07, John Meinel  wrote:

> Yeah, I'm pretty sure this machine is on "0" and we've just overcommitted
> enough that Linux is refusing to overcommit more. I'm pretty sure juju was
> at least at 2GB of pages, where 1G was in RAM and 1GB was in swap. And if
> we've already overcommitted to 9.7GB over 6.2GB linux probably decided that
> another 2GB was "obvious overcommits" that it would refuse.
>
> John
> =:->
>
>
> On Wed, Jun 3, 2015 at 5:32 PM, Gustavo Niemeyer 
> wrote:
>
>> From https://www.kernel.org/doc/Documentation/vm/overcommit-accounting:
>>
>> The Linux kernel supports the following overcommit handling modes
>>
>> 0-   Heuristic overcommit handling. Obvious overcommits of
>>  address space are refused. Used for a typical system. It
>>  ensures a seriously wild allocation fails while allowing
>>  overcommit to reduce swap usage.  root is allowed to
>>  allocate slightly more memory in this mode. This is the
>>  default.
>>
>> 1-   Always overcommit. Appropriate for some scientific
>>  applications. Classic example is code using sparse arrays
>>  and just relying on the virtual memory consisting almost
>>  entirely of zero pages.
>>
>> 2-   Don't overcommit. The total address space commit
>>  for the system is not permitted to exceed swap + a
>>  configurable amount (default is 50%) of physical RAM.
>>  Depending on the amount you use, in most situations
>>  this means a process will not be killed while accessing
>>  pages but will receive errors on memory allocation as
>>  appropriate.
>>
>>  Useful for applications that want to guarantee their
>>  memory allocations will be available in the future
>>  without having to initialize every page.
>>
>>
>> On Wed, Jun 3, 2015 at 7:40 AM, John Meinel 
>> wrote:
>>
>>> So interestingly we are already fairly heavily overcommitted. We have
>>> 4GB of RAM and 4GB of swap available. And cat /proc/meminfo is saying:
>>> CommitLimit: 6214344 kB
>>> Committed_AS:9764580 kB
>>>
>>> John
>>> =:->
>>>
>>>
>>>
>>> On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer 
>>> wrote:
>>>
 Ah, and you can also suggest increasing the swap. It would not actually
 be used, but the system would be able to commit to the amount of memory
 required, if it really had to.
  On Jun 3, 2015 1:24 AM, "Gustavo Niemeyer" 
 wrote:

> Hey John,
>
> It's probably an overcommit issue. Even if you don't have the memory
> in use, cloning it would mean the new process would have a chance to 
> change
> that memory and thus require real memory pages, which the system obviously
> cannot give it. You can workaround that by explicitly enabling overcommit,
> which means the potential to crash late in strange places in the bad case,
> but would be totally okay for the exec situation.
> So we're running into this failure mode again at one of our sites.
>
> Specifically, the system is running with a reasonable number of nodes
> (~100) and has been running for a while. It appears that it wanted to
> restart itself (I don't think it restarted jujud, but I do think it at
> least restarted a lot of the workers.)
> Anyway, we have a fair number of things that we "exec" during startup
> (kvm-ok, restart rsyslog, etc).
> But when we get into this situation (whatever it ac

Re: "fork/exec ... unable to allocate memory"

2015-06-03 Thread Menno Smits

On 4 June 2015 at 11:56, Menno Smits  wrote:

>
> bigs := make([][]byte, 6)
>

Note: I was using 6GB because my machine is running a bunch of VMs and has
very little free memory at the moment. With the 14GB allocated in the
example, memory runs out before the program gets to run echo. Anyone trying
to run the example on their own machine will need to tweak this value to
suit the amount of available memory.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: workers using *state.State

2015-09-08 Thread Menno Smits

Hey Will,

FWIW, I'm responsible for 2 of them - dblogpruner and txnpruner. They were
created before I'd ever heard anything about workers not using *state.State
directly, certainly before the recent push to clean such workers up.
They're not all that new and weren't created in violation of explicit
instructions (the instructions came later).

When I created these workers I looked at the existing code and saw that
workers which only ran within the state servers used State directly so
that's what I did. These workers are very much state server specific so it
seemed sensible to take this approach.

I will update these workers to go via the API and cards already exist on
our team's board. It's just a matter of finding time amongst everything
else.

- Menno

On 8 September 2015 at 19:12, William Reade 
wrote:

> People keep writing them, in defiance of sane layering and explicit
> instructions, for the most embarrassingly trivial tasks
> (statushistorypruner? dblogpruner? txnpruner? *all* of those can and should
> pass through a simple api facade, not just dance off to play with the
> direct-db-access fairies.)
>
> There is no justification for *any* of those things to see a *state.State,
> and I'm going to start treating new workers that violate layering this way
> as deliberate sabotage attempts. Leads who have overseen the introduction
> of those workers, sort it out.
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: workers using *state.State

2015-09-08 Thread Menno Smits

You missed another worker that needs updating: envWorkerManager. Its use of
*state.State is a little less obvious.

Ticket and card added: https://bugs.launchpad.net/juju-core/+bug/1493606

On 9 September 2015 at 12:43, Tim Penhey  wrote:

> On 09/09/15 12:36, Horacio Duran wrote:
> > There is lazy and there is also "I just based in that other worker"
> > which happens, I am the proud parent of statushistorypruner and a
> > rewrite is underway too, sorry.
>
> Don't get me wrong, lazy developers are generally good. We try to find
> the simplest thing that will work.
>
> Tim
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: workers using *state.State

2015-09-08 Thread Menno Smits

On 9 September 2015 at 12:51, Tim Penhey  wrote:

> On 09/09/15 12:47, Menno Smits wrote:
> > You missed another worker that needs updating: envWorkerManager. Its use
> > of *state.State is a little less obvious.
>
> I had left that one off because I thought it only had the state instance
> to pass on to other workers.
>
> But I guess it does need updating, so thank you.
>
> Tim
>
>
It also uses the State itself to call State.WatchEnvironments (which will
need to be exposed via the API as part of the clean up).
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Running upgrade steps for units

2015-09-15 Thread Menno Smits

On 16 September 2015 at 08:41, Tim Penhey  wrote:

> On 15/09/15 19:38, William Reade wrote:
> > Having the machine agent run unit agent upgrade steps would be a Bad
> > Thing -- the unit agents are still actively running the old code at that
> > point. Stopping the unit agents and managing the upgrade purely from the
> > machine would be ok; but it feels like a lot of effort for very little
> > payoff, so I'm most inclined to WONTFIX it and spend the energy on agent
> > consolidation instead.
>
> This still leaves us with the problem of the two upgrade steps that were
> written to update the uniter state file, and how to handle this.
>

If the work that these upgrade steps did is fairly trivial we could have
the unit agents run a function which does the upgrade work as it comes up,
before workers are started. This might be an acceptable solution if we're
going to merge machine and unit agents soon[1] anyway.

I had thought it might be reasonably easy to get the upgrade machinery
working within the unit agent but now that I've looked at the code I can
see that it's a fairly major undertaking (to do it Right at least).

- Menno

[1] FSVO "soon"
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: On Call Reviewer Duties

2015-10-21 Thread Menno Smits

+1

On 22 October 2015 at 00:52, Nate Finch  wrote:

> Sounds good to me.
>
> On Wed, Oct 21, 2015, 6:39 AM Matthew Williams <
> matthew.willi...@canonical.com> wrote:
>
>> Hey Folks,
>>
>> I propose that on call reviewers as part of their job should also review
>> the issues list on https://github.com/juju/juju/issues and attempt to
>> triage anything that's there. We moved to github to improve collaboration,
>> but the issues list is often left ignored. If every OCR took a look at it
>> we'd get it under control very quickly
>>
>> Thoughts
>>
>> Matty
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

I implemented the ability for Juju's logs to go to MongoDB some time ago.
This feature is available in 1.25 behind a feature flag and also gets
enabled automatically if the "jes" feature flag is enabled (logging via
rsyslog doesn't make much sense when using multiple environments in the one
system).

There's also another separate feature flag which turns off rsyslog based
logging.

The "jes" feature flag will be removed soon meaning that database logging
will need to become default functionality. The question is do we then also
remove logging to rsyslog functionality, or perhaps reverse the sense of
the rsyslog feature flag so that it's off by default but can be enabled if
people want it for some reason?

I'm confident that the logging to the database feature is solid. I spent of
lot of time confirming that performance wouldn't be an issue. The code is
well tested. Automatic log rotation is implemented.

The main issue I can see is that once rsyslog based logging is turned off
we lose the all-machines.log file which some people and systems no doubt
rely on. The logs for an environment can of course still be retrieved using
the "juju debug-log" command.

If we really want something like all-machines.log, it wouldn't be /too/
hard to implement a worker which generates an all-machines.log style file
in real-time from the database logs. All the pieces to implement that
already exist, but I really don't have the bandwidth this cycle to do the
work.

Thoughts?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

On 23 October 2015 at 00:50, Marco Ceppi  wrote:

> How do current customers, like dtag, who are very keen on having
> everything goto syslog/rsyslog, going to keep that functionality going
> forward? Is there an easy mechanism to get the logs out of mongodb?
>
> All too often when I'm rummaging around in debug-hooks, I'll tail
> /var/log/juju/unit-*.log to get an idea of what to expect or see what
> failed. Does that workflow change with this feature?
>
>
Sorry, I should have been clearer. This change doesn't affect the existing
unit-*.log and machine-*.log files. It's only about the streaming of logs
to the API servers. The workflow you describe will continue to work.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

On 23 October 2015 at 00:50, Marco Ceppi  wrote:

> How do current customers, like dtag, who are very keen on having
> everything goto syslog/rsyslog, going to keep that functionality going
> forward? Is there an easy mechanism to get the logs out of mongodb?
>

AFAIK we've never supported sending Juju's logs to an external syslog so
nothing changes for existing users there. The logging-to-MongoDB change
doesn't affect any use of rsyslog by services deployed by Juju - this is
only about Juju's own logs.

The work done for the logging-to-MongoDB feature does pave the way for
sending Juju's logs to external logging systems but we haven't taken it
that far yet.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

On 22 October 2015 at 22:53, Andrew Wilkins 
wrote:

> On Thu, Oct 22, 2015 at 5:39 PM roger peppe 
> wrote:
>
>> This seems reasonable to me. For myself, the time I'm looking at
>> all-machines.log
>> is often when the system is broken. I wonder whether a reasonable
>> substitute might be a low level tool that can operate directly on the
>> mongo
>> database to fetch the logs, bypassing the juju system.
>>
>
> +1
> I'd like to see rsyslog gone, and the new logging to be enabled by
> default, but I think we do need some tooling to extract logs when things go
> awry.
>

OK, we'll work on a utility which can be run on a Juju controller server to
extract all the logs from the database, even when the Juju server isn't
running.

Note that there is also already a last resort logsink.log file generated on
each Juju controller server which records the last 1GB of logs received by
that server. If you grab this file from each server and merge them you end
up with the all the recent logs.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

On 22 October 2015 at 23:04, Adam Collard 
wrote:

> On Thu, 22 Oct 2015 at 10:39 roger peppe 
> wrote:
>
>> I have one objection to the debug-log command - it doesn't
>> appear to be possible to get the log up until the current time
>> without blocking to wait for more messages when it gets
>> to the end. So it's not quite a substitute because I can't easily
>> grep the log without interrupting the grep command.
>>
>
> FWIW this is
> https://bugs.launchpad.net/juju-core/+bug/1390585
>

This should be quite easy to fix. Onyx will deal with this too.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

On 23 October 2015 at 04:17, Nate Finch  wrote:

> IMO, all-machines.log is a bad idea anyway (it duplicates what's in the
> log files already, and makes it very likely that the state machines will
> run out of disk space, since they're potentially aggregating hundreds or
> thousands of machines' logs, not to mention adding a lot of network
> overhead). I'd be happy to see it go away.
>

Collecting the logs in a central place provides a single audit log and
makes it possible to get the big picture when investigating why something
happened. It also means we still have logs even when machines have been
decommissioned.


> However, I am not convinced that dropping text file logs in general is a
> good idea, so I'd love to hear what we're gaining by putting logs in Mongo.
>

The machine-*.log and unit-*.log files will remain as they are now. This
change only affects the way logs get to the Juju controllers and how
they're stored there.

The main driver for putting the logs into the database was so we can
cleanly separate logs for different environments running under a single
controller. This is difficult to achieve reliably with rsyslog. There are
other benefits too: more powerful filtering of logs, faster log queries
(via indexes), allowing for structured log data to be emitted.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

On 23 October 2015 at 05:02, Stuart Bishop 
wrote:

>
> I'm looking forward to having access to them in a structured format so
> I can generate logs, reports and displays the way I like rather than
> dealing with the hard to parse strings in the text logs. 'juju
> debug-logs [--from ts] [--until ts] [-F] [--format=json]' would keep
> me quite happy and I can filter, format, interleave and colorize the
> output to my hearts content. I can even generate all-machines.log if I
> feel like a headache ;)
>

Output of structured log data is now very possible. Someone just needs to
make the API server and client changes to support it. I don't think it's
something that Onyx can commit to for this cycle but it might end up being
one of those things that happens over the course of a few Friday afternoons.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits

On 23 October 2015 at 03:16, Aaron Bentley 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> > The main issue I can see is that once rsyslog based logging is
> > turned off we lose the all-machines.log file which some people and
> > systems no doubt rely on. The logs for an environment can of course
> > still be retrieved using the "juju debug-log" command.
>
> all-machines.log is one of many log files CI stores so that failures
> can be debugged afterwards.
>

Once logging to the database becomes the default, all-machines.log won't be
there any more. It helps that the various machine-*.log files are already
also being collected by the CI infrastructure where possible. This
mitigates the issue somewhat.

What about running "juju debug-log" for the duration of each CI test? This
would mean all the logs end up on the host running the tests negating the
need to scp a file.

Alternatively, the to-be-created tool to dump the logs from the database
could be used after a test completes.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Renaming of environments

2015-10-22 Thread Menno Smits

 While working on the environment migrations spec I noticed that it is
currently not possible to change the name of a Juju environment once it has
been created. This creates an unfortunate corner case in the context of
environment migrations, where you might have an environment that can't be
migrated to another controller because you've already created another
(unrelated) environment with the same name on the target controller.

Rick pointed out that it would also be nice to be able to rename an
environment when its purpose has changed. For example, you might have
created an environment called "test" which you build up and end up using
for production purposes. At that point the environment name doesn't make
much sense.

We will fix this. The rename itself is fairly easy to implement but
environment names have also been used as part of things such as EC2 and
Openstack security group names so this will need to change too. It would be
better if the names of external environment-related resources used the
environment UUID instead. There is a card for this work in Onyx's backlog.

So just a heads up that this is a current weakness which will get addressed.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

More on logging changes in Juju 1.26.0

2015-11-08 Thread Menno Smits

In a recent thread I highlighted the upcoming change in Juju's logging
where logs from machine and unit agents will be streamed over Juju's API
instead of via rsyslogd. Two things came out of that discussion:

1. a desire for juju debug-logs to support a "non-tailing" mode (as per bug
1390585 )
2. a desire for a tool to dump the logs out of the database in the case of
Juju being unavailable for some reason.

I'm happy to report that #1 is being worked by right now as part of wider
changes to remove the "jes" and "db-log" feature flags, and #2 is done as
of today. Under 1.26, machine agents will install a /usr/bin/juju-dumplogs
command which can be run on any state server to generate complete dumps of
all stored logs (one file per environment).

There are some things to be aware of for when existing environments are
upgraded to 1.26.0 or later:

   - The existing all-machines.log will stop receiving updates very soon
   after the state server upgrades to the new Juju version.
   - all-machines.log will be left behind, but will be compressed.
   - Some text will be appended to the bottom of all-machines.log
   explaining that the file is no longer used with some pointers to relevant
   documentation.
   - There will be a small window where the state servers have upgraded to
   the new Juju version but some agents haven't. Those agents will continue
   trying to log to rsyslogd but it will no longer be running. Logs sent
   during that period will not be recorded in all-machines.log or in the new
   logs database. They will still appear in the machine or unit's local log
   file however.
   - We do not plan to pre-seed the logs database with existing logs from
   all-machines.log.

As always, questions and comments welcome.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

1 2 >

1 - 100 of 155 matches

Mail list logo