from:"Gustavo Niemeyer"

Re: A cautionary tale - mgo asserts

2016-06-08 Thread Gustavo Niemeyer

Is it mgo/txn that is internally unmarahalling onto that?

Let's get that fixed at its heart.
On Jun 8, 2016 12:27 PM, "roger peppe" <roger.pe...@canonical.com> wrote:

> The Assert field in mgo/txn.Op is an interface{}, so
> when it's marshaled and unmarshaled, the order
> can change because unmarshaling unmarshals as bson.M
> which does not preserve key order.
>
> https://play.golang.org/p/_1ZPl7iMyn
>
> On 8 June 2016 at 15:55, Gustavo Niemeyer
> <gustavo.nieme...@canonical.com> wrote:
> > Is it mgo itself that is changing the order internally?
> >
> > It should not do that.
> >
> > On Wed, Jun 8, 2016 at 8:00 AM, roger peppe <rogpe...@gmail.com> wrote:
> >>
> >> OK, I understand now, I think.
> >>
> >> The underlying problem is that subdocument searches in MongoDB
> >> are order-sensitive.
> >>
> >> For example, I just tried this in a mongo shell:
> >>
> >> > db.foo.insert({_id: "one", x: {a: 1, b: 2}})
> >> > db.foo.find({x: {a: 1, b: 2}})
> >> { "_id" : "one", "x" : { "a" : 1, "b" : 2 } }
> >> > db.foo.find({x: {b: 2, a: 1}})
> >> >
> >>
> >> The second find doesn't return anything even though it contains
> >> the same fields with the same values as the first.
> >>
> >> Urk. I did not know about that. What a gotcha!
> >>
> >> So it *could* technically be OK if the fields in the struct (and
> >> any bson.D) are lexically ordered to match the bson Marshaler,
> >> but well worth avoiding.
> >>
> >> I think things would be considerably improved if mgo/bson preserved
> >> order by default (by using bson.D) when unmarshaling.
> >> Then at least you'd know that the assertion you specify
> >> is exactly the one that gets executed.
> >>
> >>   cheers,
> >> rog.
> >>
> >>
> >>
> >>
> >> On 8 June 2016 at 10:42, Menno Smits <menno.sm...@canonical.com> wrote:
> >> >
> >> >
> >> > On 8 June 2016 at 21:05, Tim Penhey <tim.pen...@canonical.com> wrote:
> >> >>
> >> >> Hi folks,
> >> >>
> >> >> tl;dr: not use structs in transaction asserts
> >> >>
> >> >> ...
> >> >>
> >> >> The solution is to not use a field struct equality, even though it is
> >> >> easy
> >> >> to write, but to use the dotted field notation to check the embedded
> >> >> field
> >> >> values.
> >> >
> >> >
> >> >
> >> > To give a more concrete example, asserting on a embedded document
> field
> >> > like
> >> > this is problematic:
> >> >
> >> >   ops := []txn.Op{{
> >> >   C: "collection",
> >> >   Id: ...,
> >> >   Assert: bson.D{{"some-field", Thing{A: "foo", B: 99}}},
> >> >   Update: ...
> >> >   }
> >> >
> >> > Due to the way mgo works[1], the document the transaction operation is
> >> > asserting against may have been written with A and B in reverse order,
> >> > or
> >> > the Thing struct in the Assert may have A and B swapped by the time
> it's
> >> > used. Either way, the assertion will fail randomly.
> >> >
> >> > The correct approach is to express the assertion like this:
> >> >
> >> >   ops := []txn.Op{{
> >> >   C: "collection",
> >> >   Id: ...,
> >> >   Assert: bson.D{
> >> >   {"some-field.A", "foo"},
> >> >   {"some-field.B", 99},
> >> >   },
> >> >   Update: ...
> >> >   }
> >> >
> >> > or this:
> >> >
> >> >   ops := []txn.Op{{
> >> >   C: "collection",
> >> >   Id: ...,
> >> >   Assert: bson.M{
> >> >   "some-field.A": "foo",
> >> >   "some-field.B": 99,
> >> >   },
> >> >   Update: ...
> >> >   }
> >> >
> >> >>
> >> >> Yet another thing to add to the list of things to check when doing
> >> >> reviews.
> >> >
> >> >
> >> > I think we can go a bit further and error on attempts to use structs
> for
> >> > comparison in txn.Op asserts in Juju's txn layers in state. Just as we
> >> > already do some munging and checking of database operations to ensure
> >> > correct multi-model behaviour, we should be able to do this same for
> >> > this
> >> > issue and prevent it from happening again.
> >> >
> >> > - Menno
> >> >
> >> > [1] If transaction operations are loaded and used from the DB (more
> >> > likely
> >> > under load when multiple runners are acting concurrently), the Insert,
> >> > Update and Assert fields are loaded as bson.M (this is what the bson
> >> > Unmarshaller does for interface{} typed fields). Once this happens
> field
> >> > ordering is lost.
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Juju-dev mailing list
> >> > Juju-dev@lists.ubuntu.com
> >> > Modify settings or unsubscribe at:
> >> > https://lists.ubuntu.com/mailman/listinfo/juju-dev
> >> >
> >>
> >> --
> >> Juju-dev mailing list
> >> Juju-dev@lists.ubuntu.com
> >> Modify settings or unsubscribe at:
> >> https://lists.ubuntu.com/mailman/listinfo/juju-dev
> >
> >
> >
> >
> > --
> > gustavo @ http://niemeyer.net
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: A cautionary tale - mgo asserts

2016-06-08 Thread Gustavo Niemeyer

Is it mgo itself that is changing the order internally?

It should not do that.

On Wed, Jun 8, 2016 at 8:00 AM, roger peppe  wrote:

> OK, I understand now, I think.
>
> The underlying problem is that subdocument searches in MongoDB
> are order-sensitive.
>
> For example, I just tried this in a mongo shell:
>
> > db.foo.insert({_id: "one", x: {a: 1, b: 2}})
> > db.foo.find({x: {a: 1, b: 2}})
> { "_id" : "one", "x" : { "a" : 1, "b" : 2 } }
> > db.foo.find({x: {b: 2, a: 1}})
> >
>
> The second find doesn't return anything even though it contains
> the same fields with the same values as the first.
>
> Urk. I did not know about that. What a gotcha!
>
> So it *could* technically be OK if the fields in the struct (and
> any bson.D) are lexically ordered to match the bson Marshaler,
> but well worth avoiding.
>
> I think things would be considerably improved if mgo/bson preserved
> order by default (by using bson.D) when unmarshaling.
> Then at least you'd know that the assertion you specify
> is exactly the one that gets executed.
>
>   cheers,
> rog.
>
>
>
>
> On 8 June 2016 at 10:42, Menno Smits  wrote:
> >
> >
> > On 8 June 2016 at 21:05, Tim Penhey  wrote:
> >>
> >> Hi folks,
> >>
> >> tl;dr: not use structs in transaction asserts
> >>
> >> ...
> >>
> >> The solution is to not use a field struct equality, even though it is
> easy
> >> to write, but to use the dotted field notation to check the embedded
> field
> >> values.
> >
> >
> >
> > To give a more concrete example, asserting on a embedded document field
> like
> > this is problematic:
> >
> >   ops := []txn.Op{{
> >   C: "collection",
> >   Id: ...,
> >   Assert: bson.D{{"some-field", Thing{A: "foo", B: 99}}},
> >   Update: ...
> >   }
> >
> > Due to the way mgo works[1], the document the transaction operation is
> > asserting against may have been written with A and B in reverse order, or
> > the Thing struct in the Assert may have A and B swapped by the time it's
> > used. Either way, the assertion will fail randomly.
> >
> > The correct approach is to express the assertion like this:
> >
> >   ops := []txn.Op{{
> >   C: "collection",
> >   Id: ...,
> >   Assert: bson.D{
> >   {"some-field.A", "foo"},
> >   {"some-field.B", 99},
> >   },
> >   Update: ...
> >   }
> >
> > or this:
> >
> >   ops := []txn.Op{{
> >   C: "collection",
> >   Id: ...,
> >   Assert: bson.M{
> >   "some-field.A": "foo",
> >   "some-field.B": 99,
> >   },
> >   Update: ...
> >   }
> >
> >>
> >> Yet another thing to add to the list of things to check when doing
> >> reviews.
> >
> >
> > I think we can go a bit further and error on attempts to use structs for
> > comparison in txn.Op asserts in Juju's txn layers in state. Just as we
> > already do some munging and checking of database operations to ensure
> > correct multi-model behaviour, we should be able to do this same for this
> > issue and prevent it from happening again.
> >
> > - Menno
> >
> > [1] If transaction operations are loaded and used from the DB (more
> likely
> > under load when multiple runners are acting concurrently), the Insert,
> > Update and Assert fields are loaded as bson.M (this is what the bson
> > Unmarshaller does for interface{} typed fields). Once this happens field
> > ordering is lost.
> >
> >
> >
> >
> > --
> > Juju-dev mailing list
> > Juju-dev@lists.ubuntu.com
> > Modify settings or unsubscribe at:
> > https://lists.ubuntu.com/mailman/listinfo/juju-dev
> >
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>



-- 
gustavo @ http://niemeyer.net
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: fork/exec ... unable to allocate memory

2015-06-03 Thread Gustavo Niemeyer

From https://www.kernel.org/doc/Documentation/vm/overcommit-accounting:

The Linux kernel supports the following overcommit handling modes

0   -   Heuristic overcommit handling. Obvious overcommits of
address space are refused. Used for a typical system. It
ensures a seriously wild allocation fails while allowing
overcommit to reduce swap usage.  root is allowed to
allocate slightly more memory in this mode. This is the
default.

1   -   Always overcommit. Appropriate for some scientific
applications. Classic example is code using sparse arrays
and just relying on the virtual memory consisting almost
entirely of zero pages.

2   -   Don't overcommit. The total address space commit
for the system is not permitted to exceed swap + a
configurable amount (default is 50%) of physical RAM.
Depending on the amount you use, in most situations
this means a process will not be killed while accessing
pages but will receive errors on memory allocation as
appropriate.

Useful for applications that want to guarantee their
memory allocations will be available in the future
without having to initialize every page.


On Wed, Jun 3, 2015 at 7:40 AM, John Meinel j...@arbash-meinel.com wrote:

 So interestingly we are already fairly heavily overcommitted. We have 4GB
 of RAM and 4GB of swap available. And cat /proc/meminfo is saying:
 CommitLimit: 6214344 kB
 Committed_AS:9764580 kB

 John
 =:-



 On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer gust...@niemeyer.net
 wrote:

 Ah, and you can also suggest increasing the swap. It would not actually
 be used, but the system would be able to commit to the amount of memory
 required, if it really had to.
  On Jun 3, 2015 1:24 AM, Gustavo Niemeyer gust...@niemeyer.net wrote:

 Hey John,

 It's probably an overcommit issue. Even if you don't have the memory in
 use, cloning it would mean the new process would have a chance to change
 that memory and thus require real memory pages, which the system obviously
 cannot give it. You can workaround that by explicitly enabling overcommit,
 which means the potential to crash late in strange places in the bad case,
 but would be totally okay for the exec situation.
 So we're running into this failure mode again at one of our sites.

 Specifically, the system is running with a reasonable number of nodes
 (~100) and has been running for a while. It appears that it wanted to
 restart itself (I don't think it restarted jujud, but I do think it at
 least restarted a lot of the workers.)
 Anyway, we have a fair number of things that we exec during startup
 (kvm-ok, restart rsyslog, etc).
 But when we get into this situation (whatever it actually is) then we
 can't exec anything and we start getting failures.

 Now, this *might* be a golang bug.

 When I was trying to debug it in the past, I created a small program
 that just allocated big slices of memory (10MB strings, IIRC) and then
 tried to run echo hello until it started failing.
 IIRC the failure point was when I wasn't using swap and the allocated
 memory was 50% of total available memory. (I have 8GB of RAM, it would
 start failing once we had allocated 4GB of strings).
 When I tried digging into the golang code, it looked like they use
 clone(2) as the create a new process for exec function. And it seemed it
 wasn't playing nicely with copy-on-write. At least, it appeared that
 instead of doing a simple copy-on-write clone without allocating any new
 memory and then exec into a new process, it actually required to have
 enough RAM available for the new process.

 On the customer site, though, jujud has a RES size of only 1GB, and they
 have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in
 use).

 The only workaround I can think of is for us to create a forker
 process right away at startup that we just send RPC requests to run a
 command for us and return the results. ATM I don't think we do any fork and
 run interactively such that we need the stdin/stdout file handles inside
 our process.

 I'd rather just have golang fork() work even when the current process is
 using a large amount of RAM.

 Any of the golang folks know what is going on?

 John
 =:-


 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev





-- 

gustavo @ http://niemeyer.net
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: fork/exec ... unable to allocate memory

2015-06-02 Thread Gustavo Niemeyer

Hey John,

It's probably an overcommit issue. Even if you don't have the memory in
use, cloning it would mean the new process would have a chance to change
that memory and thus require real memory pages, which the system obviously
cannot give it. You can workaround that by explicitly enabling overcommit,
which means the potential to crash late in strange places in the bad case,
but would be totally okay for the exec situation.
So we're running into this failure mode again at one of our sites.

Specifically, the system is running with a reasonable number of nodes
(~100) and has been running for a while. It appears that it wanted to
restart itself (I don't think it restarted jujud, but I do think it at
least restarted a lot of the workers.)
Anyway, we have a fair number of things that we exec during startup
(kvm-ok, restart rsyslog, etc).
But when we get into this situation (whatever it actually is) then we can't
exec anything and we start getting failures.

Now, this *might* be a golang bug.

When I was trying to debug it in the past, I created a small program that
just allocated big slices of memory (10MB strings, IIRC) and then tried to
run echo hello until it started failing.
IIRC the failure point was when I wasn't using swap and the allocated
memory was 50% of total available memory. (I have 8GB of RAM, it would
start failing once we had allocated 4GB of strings).
When I tried digging into the golang code, it looked like they use clone(2)
as the create a new process for exec function. And it seemed it wasn't
playing nicely with copy-on-write. At least, it appeared that instead of
doing a simple copy-on-write clone without allocating any new memory and
then exec into a new process, it actually required to have enough RAM
available for the new process.

On the customer site, though, jujud has a RES size of only 1GB, and they
have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in
use).

The only workaround I can think of is for us to create a forker process
right away at startup that we just send RPC requests to run a command for
us and return the results. ATM I don't think we do any fork and run
interactively such that we need the stdin/stdout file handles inside our
process.

I'd rather just have golang fork() work even when the current process is
using a large amount of RAM.

Any of the golang folks know what is going on?

John
=:-


--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/juju-dev
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: fork/exec ... unable to allocate memory

2015-06-02 Thread Gustavo Niemeyer

Ah, and you can also suggest increasing the swap. It would not actually be
used, but the system would be able to commit to the amount of memory
required, if it really had to.
 On Jun 3, 2015 1:24 AM, Gustavo Niemeyer gust...@niemeyer.net wrote:

 Hey John,

 It's probably an overcommit issue. Even if you don't have the memory in
 use, cloning it would mean the new process would have a chance to change
 that memory and thus require real memory pages, which the system obviously
 cannot give it. You can workaround that by explicitly enabling overcommit,
 which means the potential to crash late in strange places in the bad case,
 but would be totally okay for the exec situation.
 So we're running into this failure mode again at one of our sites.

 Specifically, the system is running with a reasonable number of nodes
 (~100) and has been running for a while. It appears that it wanted to
 restart itself (I don't think it restarted jujud, but I do think it at
 least restarted a lot of the workers.)
 Anyway, we have a fair number of things that we exec during startup
 (kvm-ok, restart rsyslog, etc).
 But when we get into this situation (whatever it actually is) then we
 can't exec anything and we start getting failures.

 Now, this *might* be a golang bug.

 When I was trying to debug it in the past, I created a small program that
 just allocated big slices of memory (10MB strings, IIRC) and then tried to
 run echo hello until it started failing.
 IIRC the failure point was when I wasn't using swap and the allocated
 memory was 50% of total available memory. (I have 8GB of RAM, it would
 start failing once we had allocated 4GB of strings).
 When I tried digging into the golang code, it looked like they use
 clone(2) as the create a new process for exec function. And it seemed it
 wasn't playing nicely with copy-on-write. At least, it appeared that
 instead of doing a simple copy-on-write clone without allocating any new
 memory and then exec into a new process, it actually required to have
 enough RAM available for the new process.

 On the customer site, though, jujud has a RES size of only 1GB, and they
 have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in
 use).

 The only workaround I can think of is for us to create a forker process
 right away at startup that we just send RPC requests to run a command for
 us and return the results. ATM I don't think we do any fork and run
 interactively such that we need the stdin/stdout file handles inside our
 process.

 I'd rather just have golang fork() work even when the current process is
 using a large amount of RAM.

 Any of the golang folks know what is going on?

 John
 =:-


 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Pruning the txns collection

2015-05-13 Thread Gustavo Niemeyer

Hey Menno,

I'm copying the list to ensure we have this documented somewhere for future
reference.

You are right that it's not that simple, but it's not that complex either
once you understand the background.

Transactions are applied by the txn package by tagging each one of the
documents that will participate in the transaction with the transaction id
they are participating in. When mgo goes to apply a transaction in that
same document, it will tag the document with the new transaction id, and
then evaluate all the transactions it is part of. If you drop one of the
transactions that a document claims to be participating in, then the txn
package will rightfully complain since it cannot tell the state of a
transaction that explicitly asked to be considered for the given document.

That means the solution is to make sure removed transactions are 1) in a
final state; and 2) not being referenced by any tagged documents.

The txn package itself collects garbage from old transactions as new
transactions are applied, but it doesn't guarantee that right after a
transaction reaches a final state it will be collected. This can lead to
pretty old transactions being referenced, if these documents are never
touched again.

So, you have two choices to collect these old documents:

1. Clean up the transaction references from all documents

or

2. Just make sure the transaction being removed is not referenced anywhere

I would personally go for 2, as it is a read-only operation everywhere but
in the transactions collection itself, to drop the transaction document.

Note that the same rules here apply to the stash collection as well.

Please let me know if you run into any issues there.



On Tue, May 12, 2015 at 9:21 PM, Menno Smits menno.sm...@canonical.com
wrote:

 Hi again,

 In response to the current production Juju issue, I've been tasked with
 adding something to Juju to keep the size of Juju's txns and txns.log
 collections under control so that they don't grow unbounded.

 The ticket is here: https://bugs.launchpad.net/juju-core/+bug/1453785

 Naively, one might think that transactions could be removed if they were
 (say) over a week old and marked as either applied or aborted but of course
 it's not that simple[1]. I must admit that I don't completely understand
 why this is case, even when I look at the code for mgo/txn. How does a
 pending transaction end up depending on the details of an applied
 transaction?

 Given that a typical Juju system has no maintenance window and there's
 (currently) no way to put a Juju system into a read-only mode can you
 think of any practical way for Juju to prune the txn and txn.stash
 collections?

 Any ideas would be most helpful.

 - Menno

 [1]
 http://grokbase.com/t/gg/mgo-users/13cj7c6kxt/when-to-delete-db-transaction




-- 
gustavo @ http://niemeyer.net
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Please, no more types called State

2015-03-12 Thread Gustavo Niemeyer

When I was new to juju myself, we only had one State, I believe. That
one golden state was supposed to represent the state of the whole
deployment, so it was indeed The State of the system. Having tons of
these indeed sounds awkward.

On Thu, Mar 12, 2015 at 8:08 AM, Michael Foord
michael.fo...@canonical.com wrote:


 On 12/03/15 05:01, David Cheney wrote:

 lucky(~/src/github.com/juju/juju) % pt -i type\ State\ | wc -l

 23

 Thank you.


 When I was new to Juju the fact that we had a central State, core to the
 Juju model, but we had umpteen types called State - so where you saw a State
 you had no idea what it actually was and when someone mentioned State you
 couldn't be sure what they meant - was a significant part of the learning
 curve.

 Perhaps a better solution would have been a better name for the core State.

 Michael



 Dave



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: adding placement directives for ensure-availability

2015-02-24 Thread Gustavo Niemeyer

Hi Nate,

On Tue, Feb 24, 2015 at 2:24 PM, Nate Finch nate.fi...@canonical.com wrote:
(...)
 To support this, we need a way to say use the default placement policy.
 For this, we propose the keyword default.  Thus, to fix the above example,
 Bill would type this:

 $ juju ensure-availability --to lxc:1,default
 success output here

What's the full format of the parameter of --to, with all possible details?

 Note that this change in no way fixes all of HA's UX problems, and that it
 actually makes some of the problems a lot more obvious (such as the fact
 that the number of placements you need can be different even for the same
 command, depending on the state of the environment).  This will be fixed
 when we revamp the CLI, but for now we'll have to live with it.

I don't have much context on the problem, but it seems like the
proposal is a change in the design of the CLI. If there are known
problems on the current design, the change might well fix it instead
of making it worse?


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Feedback on a base fake type in the testing repo

2015-02-13 Thread Gustavo Niemeyer

On Fri, Feb 13, 2015 at 2:05 PM, Eric Snow eric.s...@canonical.com wrote:
 As for me, by fake I mean a struct that implements an interface with
 essentially no logic other than to keep track of method calls and
 facilitate controlling their return values explicitly.  For examples
 see the implementations for GCE and in `testing/fake.go`.  Thus in
 tests a fake may be used in place of the concrete implementation of
 the interface that would be used in production.

To me this is a good fake implementation:

https://github.com/juju/juju/tree/master/provider/dummy

 The onus is on the test writer to populate the fake with the correct
 return values such that they would match the expected behavior of the
 concrete implementation.

That's an optimistic view of it, as I described.

 Regardless, I'm convinced that testing needs to include both high
 coverage via isolated unit tests and good enough coverage via full
 stack integration tests.  Essentially we have to ensure that layers
 work together properly and that low-level APIs work the way we expect
 (and don't change unexpectedly).

That's globally agreed. What's at stake is how to do these.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Feedback on a base fake type in the testing repo

2015-02-13 Thread Gustavo Niemeyer

On Fri, Feb 13, 2015 at 3:25 PM, Eric Snow eric.s...@canonical.com wrote:
 This is a mock object under some well known people's terminology [1].

 With all due respect to Fowler, the terminology in this space is
 fairly muddled still. :)

Sure, I'm happy to use any terminology, but I'd prefer to not make one
up just now.

 The most problematic aspect of this approach is that tests are pretty
 much always very closely tied to the implementation, in a way that you
 suddenly cannot touch the implementation anymore without also fixing a
 vast number tests to comply.

 Let's look at this from the context of unit (i.e. function
 signature) testing.  By implementation do you mean you mean the
 function you are testing, or the low-level API the function is using,
 or both?  If the low-level API then it seems like the real fake
 object you describe further on would help by moving at least part of
 the test setup down out of the test and down into the fake.  However
 aren't you then just as susceptible to changes in the fake with the
 same maintenance consequences?

No, because the fake should behave as a normal type would, instead of
expecting a very precisely constrained orchestration of calls into its
interface. If we hand the implementation a fake value, it should be
able to call that value as many times as it wants, with whatever
parameters it wants, in whatever order it wants, and its behavior
should be consistent with a realistic implementation. Again, see the
dummy provider for a convenient example of that in practice.

 Ultimately I just don't see how you can avoid depending on low-level
 details (closely tied to the implementation) in your tests and still
 have confidence that you are testing things rigorously.  I think the

I could perceive that on your original email, and it's precisely why
I'm worried and responding to this thread.

If that logic held any ground, we'd never be able to have
organizations that could certify the quality and conformance of
devices based on the device itself. Instead, they'd have to go into
the industries to see how the device was manufactured. But that's not
what it happens.. these organizations get the outcome of the
production line, no matter how that worked, because that's the most
relevant thing to test. You can change the production line, you can
optimize it away, and you can even replace entire components, and it
doesn't matter as long as you preserve the quality of the outcome. Of
course, on the way to producing a device you'll generally make use of
smaller devices, which have their own production lines, and which
ensure that the outcome of their own production lines is of high
quality.

The same thing is true in code. If you spend a lot of time writing
tests for your production line, you are optimizing for the wrong goal.
You are spending a lot of time, the outcome can still be of poor
quality, and you are making it hard to optimize your production line
and potentially replace its components and methods by something
completely different. Of course, as in actual devices, code is
layered, so sub-components can be tested on their own to ensure their
promised interfaces hold water, but even there what matters is
ensuring that what they promise is being satisfied, rather than how
they are doing it.

 Also, the testing world puts a lot are emphasis on branch coverage in
 tests.  It almost sounds like you are suggesting that is not such an
 important goal.  Could you clarify?  Perhaps I'm inferring too much
 from what you've said. :)

I'd be happy to dive into that, but it's a distraction in this
conversation. You can use or not use your coverage tool irrespective
of your testing approach.

 As a recommendation to avoid digging a hole -- one that is pretty
 difficult to climb out of once you're in -- instead of testing method
 calls and cooking fake return values in your own test, build a real
 fake object: one that pretends to be a real implementation of that
 interface, and understands the business logic of it. Then, have
 methods on it that allow tailoring its behavior, but in a high-level
 way, closer to the problem than to the code.

 Ah, I like that!  So to rephrase, instead of a type where you just
 track calls and explicitly control return values, it is better to use
 a type that implements your expectations about the low-level system,
 exposed via the same API as the actual one?  This would likely still
 involve both to implement the same interface, right?  The thing I like

That's right.

 about that approach is that is forces you to document your
 expectations (i.e. dependencies) as code.  The problem is that you pay
 (in development time and in complexity) for an extra layer to engineer

This is irrelevant if you take into account the monumental future cost
of mocking everything up.

 Regardless, as I noted in an earlier message, I think testing needs to 
 involve:

 1. a mix of high branch coverage through isolated unit tests,

I'd be very careful to not

Re: Feedback on a base fake type in the testing repo

2015-02-13 Thread Gustavo Niemeyer

On Fri, Feb 13, 2015 at 6:50 PM, Eric Snow eric.s...@canonical.com wrote:
 Using a fake for that input means you don't have to encode the
 low-level business logic in each test (just any setup of the fake's
 state).  You can be confident about the low-level behavior during
 tests as matching production operation (as long as the fake
 implementation is correct and bug free).  The potential downsides are
 any performance costs of using the fake, maintaining the fake (if
 applicable), and knowing how to manage the fake's state.
 Consequently, there should be a mechanism to ensure that the fake's
 behavior matches the real thing.

All of these costs exist whether you are dealing with one fake
implementation, or with logic spread through five hundred tests which
all include details of that interface. Hopefully the saner approach is
obvious.

 Alternately you can use a stub (what I was calling a fake) for that
 input.  On the upside, stubs are lightweight, both performance-wise
 and in terms of engineer time.  They also help limit the scope of what
 executes to just the code in the function under test.  The downside is
 that each test must encode the relevant business logic (mapping
 low-level inputs to low-level outputs) into the test's setup.  Not
 only is that fragile but the low-level return values will probably not
 have a lot of context where they appear in the test setup code
 (without comments).

Precisely. And that also implies the test knows exactly how the
function is using the interface, thus leaking its implementation into
the test, and preventing the implementation to change even in simple
ways without breaking the test.

 I'd be very careful to not overdo this. Covering a line just for the
 fun of seeing the CPU passing there is irrelevant. If you fake every
 single thing around it with no care, you'll have a CPU pointer jumping
 in and out of it, without any relevant achievement.

 I was talking about branches in the source code, not actual CPU branch
 operations. :)

Oh, so you mean you are not testing all CPU branches!?

(/me provokes the inner perfectionist spirit ;-)


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Juju Resources (a tool / library)

2015-02-11 Thread Gustavo Niemeyer

I can, but that's not the right way to proceed if you were in fact
trying to implement an important feature of juju that was extensively
discussed.

1. The project has a technical lead and a manager which should have
the proper information to bootstrap this, or at least know who to talk
to.

2. The project has a roadmap. Make sure to talk to the people in (1)
to see how this fits in.

3. I'm sure there are previous documents about this, given relevance
and prior conversations. You can find these documents yourself by
searching through specs, or by asking people that participated in
prior conversations and might have a better idea about what to search
for.

4. There should be a specification about the feature, before there is
an implementation. Stakeholders should review the specification and
approve it before there is code.





On Wed, Feb 11, 2015 at 6:40 PM, Cory Johns cory.jo...@canonical.com wrote:
 Can you be more specific on how it differs from the goals of resources
 streams?  As I mentioned in my first email, I asked around to try to
 get specific information about the proposed feature and wasn't able to
 get any concrete answers or documentation.  So I created this based on
 what I remembered from the discussions I'd heard (admittedly not much)
 and what I needed in the charms I was working on.

 I fully intend for this library to be subsumed / obviated by core as
 the feature develops, and I tried to make that clear in the library
 README and documentation.  I also intend to update the interface to
 match the feature as closely as possible as the proposal becomes more
 concrete.

 On Wed, Feb 11, 2015 at 2:33 PM, Gustavo Niemeyer gust...@niemeyer.net 
 wrote:
 Hi Cory,

 While it's fine and welcome to have such test bed features, it feels
 like the proposal and implementation have quite different goals from
 the actual resources feature we've been talking about for a while, so
 as a very early suggestion and request, I would strongly recommend
 renaming the feature to avoid creating ambiguity with the feature that
 we intend juju to have in the long run. Having two resource
 implementations and taking over important namespaces such as
 resources.yaml might introduce unnecessary confusion down the road.
 Instead, the project might have a nice non-generic name, and its
 configuration file could also be named after it.


 On Wed, Feb 11, 2015 at 4:17 PM, Cory Johns cory.jo...@canonical.com wrote:
 Per request, the documentation is now also available on
 ReadTheDocs.org: http://jujuresources.readthedocs.org/

 On Wed, Feb 11, 2015 at 11:25 AM, Cory Johns cory.jo...@canonical.com 
 wrote:
 Hi all,

 (cross-posting to juju  juju-dev)

 I've created a tool / library for organizing and managing resources
 (binary blobs, tarballs, Python packages, and, eventually, apt
 packages) required by a charm.  The idea is to be an interim tool, and
 a test-bed for the resource features that have been discussed for the
 future in Juju core.

 It is available on GitHub [1] and PyPI [2], and the full documentation
 is on PythonHosted.org [3].

 The goals of this project are:

   * Organize all external resource dependencies into a single
 resources.yaml file
   * Provide easy, consistent interfaces to manage, install, and mirror 
 resources
   * CLI and Python bindings
   * Enforce best practices, such as cryptographic hash validation

 I asked around to see if there was an existing proposal for a
 resources.yaml file format, but couldn't find one.  If someone is
 aware of an existing spec / proposal, I'd prefer to match that as much
 as possible.

 The current version is fully functional, and is currently being used
 in the framework refactor of the Apache Hadoop charms (e.g., [4]).

 Note that I created this separately from Charm Helpers primarily
 because I wanted to use it to bootstrap CH, but this also makes it
 easier to use in Bash charms.

 My next step is to add apt-get support, but that will requiring
 cleaning up the mirror server (possibly converting it to use squid,
 but I may want to keep it self-contained), and learning a bit more
 about how the apt proxy settings work.  Advice here is appreciated.


 [1] https://github.com/juju-solutions/jujuresources
 [2] https://pypi.python.org/pypi/jujuresources
 [3] http://pythonhosted.org/jujuresources/
 [4] 
 https://code.launchpad.net/~bigdata-dev/charms/trusty/apache-hadoop-hdfs-master/trunk

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --

 gustavo @ http://niemeyer.net



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: supplement open--port/close-port with ensure-these-and-only-these-ports?

2014-11-03 Thread Gustavo Niemeyer

Reminding people of everything they should *not be doing *to get a feature
to be listed in the release notes is very ineffective.

What should they *be doing* instead, and why will the process work in the
future when it clearly has failed before, despite the assumed good
intention we should assume all trusted developers to have?


On Mon Nov 03 2014 at 1:20:30 PM Curtis Hovey-Canonical 
cur...@canonical.com wrote:

 On Sat, Nov 1, 2014 at 2:08 PM, Kapil Thangavelu
 kapil.thangav...@canonical.com wrote:
 
 
  On Sat, Nov 1, 2014 at 12:58 PM, John Meinel j...@arbash-meinel.com
 wrote:
 
  I believe there is already opened-ports to tell you what ports Juju is
  currently tracking.
 
 
  That's cool and news to me, it looks like it landed in trunk earlier on
  october 2nd (ie 1.21) and hasn't made release notes or docs yet.
 Especially
  for charm environment changes we really need corresponding docs as charm
 env
  changes are not easily discover-able otherwise. Really great to see that
  land as its been a common issue for charms and one that previously forced
  them into state management.

 :( How will anything get into the release notes if engineers don't
 announce the new feature when it merges? It is not the not release
 note because engineers haven't described it.

 Asking questions to this list to discover new features isn't very
 efficient.


 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: https://lists.ubuntu.com/
 mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: how to update dependencies.tsv

2014-10-30 Thread Gustavo Niemeyer

I have never used upstream as an actual remote name. I see people
commonly using the term as a wildcard to refer to the upstream branch
whatever it is. The term is also used widely in git itself with the same
meaning, including in the command line interface. For example, you set the
upstream branch with --set-upstream (or -u for short), and in most cases
people set their origin branch as upstream.

Most posts in StackOverflow follow that:

http://stackoverflow.com/search?q=%5Bgit%5D+upstream

This confirms what Roger pointed out: upstream is well established as a
concept, not as a remote label, so it's best to use a well defined name
that points out where the code was taken from, rather than overloading the
term to mean something else.


On Thu Oct 30 2014 at 7:47:49 AM Nate Finch nate.fi...@canonical.com
wrote:

 Upstream and origin are very very common in the git world. Most any how to
 or stack overflow answer uses those by default. Origin is your repo and
 upstream is the repo you branched from.   I started out doing it your way,
 Roger, since I agree that information does flow both ways, and naming my
 repo after myself made sense, but I got so annoyed with every answer I
 looked up using origin and upstream that I changed to just use those terms.

 Using standard terms is a good thing so we all know what we're talking
 about.
 On Oct 30, 2014 4:22 AM, roger peppe roger.pe...@canonical.com wrote:

 On 29 October 2014 21:03, Tim Penhey tim.pen...@canonical.com wrote:
  On 30/10/14 01:11, roger peppe wrote:
  A better solution here, which I've been meaning to do for a while,
  would be to change godeps so that it can explore all possible
  targets. I had a go at that this morning (just adding all tags to
  build.Context) but it's not quite as easy as that. I should
  be able to fix it soon though.
 
  While you are looking at godeps, I don't suppose you can fix it so it
  looks for the upstream remote?

 As things currently are, godeps doesn't know about any remote
 in particular, and I think that's probably correct - it just uses
 git fetch (with no arguments) to fetch, and relies on the
 defaults for that.

  I was told that we should have the origin remote being our personal
  github repo and upstream being the team repo.

 I actually think that this is not a great way to configure things.
 When you clone a git repository (for example by doing go get)
 there is only one remote configured, and that's origin.

 So if I changed godeps to pull from upstream, it would have to
 fall back to pulling from origin in this, the most common case.

 Personally, I find the very word upstream confusing when
 used in this area - information flows both ways. The
 one certainty is that everything is destined for the
 main repo, so naming that origin makes sense to me.

 I never create a repo named upstream - I have origin
 and I name other remotes after github users, e.g. rogpeppe,
 which seems to scale better when I'm collaborating with
 other people.

  When godeps tries to pull in new revisions into a repo where I have the
  remotes set as I was told to, godeps fails to pull in new revisions and
  I normally do something like:
 
(cd ../names  git fetch upstream master)
 
  Then run the godeps command again.

 All the above said, I don't think there's anything stopping you from using
 this. Just do:

 git branch --set-upstream-to upstream/master

 and I think it should work (though I haven't actually tried it)

   cheers,
 rog.

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: https://lists.ubuntu.com/
 mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread Gustavo Niemeyer

The tag (which might be better named internal id) looks like an
implementation detail which doesn't seem right to expose. I'd suggest
either giving it a proper representation that the user can understand (a
sequential action number, for example), or use a hash. I'd also not use a
UUID, btw, but rather just a unique hash.



On Fri Oct 24 2014 at 2:55:45 PM John Weldon johnweld...@gmail.com wrote:

 Hi;

 The current actions spec
 https://docs.google.com/a/canonical.com/document/d/14W1-QqB1pXZxyZW5QzFFoDwxxeQXBUzgj8IUkLId6cc/edit?usp=sharing
 indicates that the actions command line should return a UUID as the
 identifier for an action once it's been en-queued using 'juju do action'.


 Is there a compelling reason to use UUID's to identify actions, versus
 using the string representation of the Tag?


 A UUID would require a command something like:
   juju status action:9e1e5aa0-5b9d-11e4-8ed6-0800200c9a66

 which maybe we could shorten to:
   juju status action:9e1e5aa0



 I would prefer something like:
   juju status action:mysq/0_a_3

 which would be the string representation of the actions Tag.



 Is there a compelling reason to use UUID?

 Cheers,

 --
 John Weldon
  --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: https://lists.ubuntu.com/
 mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread Gustavo Niemeyer

It was my mistake to call it a hash.. it may be just a random id, in hex
form. Alternatively, use a service-specific sequence number so it's better
suited to humans. In the latter case, the sequence number must
realistically reflect the sequence in which the actions are submitted to
units, otherwise it would be confusing.
On Fri Oct 24 2014 at 3:51:04 PM John Weldon johnweld...@gmail.com wrote:

Thanks Gustavo;

I think a hash would be good too. I'll see what I can find in the juju
code base around hash representations of id's, or come up with something.
Any suggestions on how to generate and translate the hash are welcome too.

Cheers,

--
John Weldon

On Fri, Oct 24, 2014 at 10:41 AM, Gustavo Niemeyer
gustavo.nieme...@canonical.com wrote:

The tag (which might be better named internal id) looks like an
implementation detail which doesn't seem right to expose. I'd suggest
either giving it a proper representation that the user can understand (a
sequential action number, for example), or use a hash. I'd also not use a
UUID, btw, but rather just a unique hash.

On Fri Oct 24 2014 at 2:55:45 PM John Weldon johnweld...@gmail.com
wrote:

Hi;

The current actions spec
https://docs.google.com/a/canonical.com/document/d/14W1-QqB1pXZxyZW5QzFFoDwxxeQXBUzgj8IUkLId6cc/edit?usp=sharing
indicates that the actions command line should return a UUID as the
identifier for an action once it's been en-queued using 'juju do action'.

Is there a compelling reason to use UUID's to identify actions, versus
using the string representation of the Tag?

A UUID would require a command something like:
juju status action:9e1e5aa0-5b9d-11e4-8ed6-0800200c9a66

which maybe we could shorten to:
juju status action:9e1e5aa0

I would prefer something like:
juju status action:mysq/0_a_3

which would be the string representation of the actions Tag.

Is there a compelling reason to use UUID?

Cheers,

--
John Weldon
--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/
mailman/listinfo/juju-dev

--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread Gustavo Niemeyer

I doubt this would work. There's no way in the transaction package for you
to generate an id and reference that same id in other fields in one go.

In other cases that's not an issue, but having a sequence of numbered
actions where 10 is applied before 9 would be awkward.

On Fri Oct 24 2014 at 4:10:30 PM John Weldon johnweld...@gmail.com wrote:

That's a good question; The sequence relies on the same mechanism in state
that is used to generate other sequences. I believe it's done in a
transaction using the provided key (in this case the id of the unit).

Cheers,

--
John Weldon

On Fri, Oct 24, 2014 at 11:07 AM, Gustavo Niemeyer
gustavo.nieme...@canonical.com wrote:

That might be okay, but is the sequence really respected? In other
words, what happens if two independent clients attempt to submit an action
for the same service? Will the two generated sequences reflect the order in
which the actions are submitted to the units at the end of the pipeline?

On Fri Oct 24 2014 at 4:05:03 PM John Weldon johnweld...@gmail.com
wrote:

Sure, that makes sense. Right now the Tag encodes a legitimate
sequence. We should probably just clean up the representation so it
doesn't expose the internals and just exposes the unit and action sequence
number.

--
John Weldon

On Fri, Oct 24, 2014 at 10:58 AM, Gustavo Niemeyer
gustavo.nieme...@canonical.com wrote:

It was my mistake to call it a hash.. it may be just a random id, in
hex form. Alternatively, use a service-specific sequence number so it's
better suited to humans. In the latter case, the sequence number must
realistically reflect the sequence in which the actions are submitted to
units, otherwise it would be confusing.

On Fri Oct 24 2014 at 3:51:04 PM John Weldon johnweld...@gmail.com
wrote:

Thanks Gustavo;

I think a hash would be good too. I'll see what I can find in the
juju code base around hash representations of id's, or come up with
something.
Any suggestions on how to generate and translate the hash are welcome
too.

Cheers,

--
John Weldon

On Fri, Oct 24, 2014 at 10:41 AM, Gustavo Niemeyer
gustavo.nieme...@canonical.com wrote:

On Fri Oct 24 2014 at 2:55:45 PM John Weldon johnweld...@gmail.com
wrote:

Hi;

Is there a compelling reason to use UUID's to identify actions,
versus using the string representation of the Tag?

A UUID would require a command something like:
juju status action:9e1e5aa0-5b9d-11e4-8ed6-0800200c9a66

which maybe we could shorten to:
juju status action:9e1e5aa0

I would prefer something like:
juju status action:mysq/0_a_3

which would be the string representation of the actions Tag.

Is there a compelling reason to use UUID?

Cheers,

--
John Weldon
--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: https://lists.ubuntu.com/
mailman/listinfo/juju-dev

--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread Gustavo Niemeyer

As a side note, and a bikeshed-prone rant which I won't embrace, naming it
tag feels like a mistake.

On Fri Oct 24 2014 at 4:13:14 PM William Reade william.re...@canonical.com
wrote:

 On Fri, Oct 24, 2014 at 8:04 PM, John Weldon johnweld...@gmail.com
 wrote:
  Sure, that makes sense.  Right now the Tag encodes a legitimate sequence.
  We should probably just clean up the representation so it doesn't expose
 the
  internals and just exposes the unit and action sequence number.

 Yeah, that works for me. Please don't expose tags in the UI -- as
 gustavo says, they're implementation details. The only critically
 important property of a tag is that it be a *unique* entity identifier
 for API use -- and that requirement is generally at odds with a
 pleasant UX.

 But, yes, if the user representation happens to have a clean 2-way
 mapping with the relevant tags, that makes life easier in some
 respects, and I certainly won't complain about that.

 Cheers
 William

 
 
  --
  John Weldon
 
  On Fri, Oct 24, 2014 at 10:58 AM, Gustavo Niemeyer
  gustavo.nieme...@canonical.com wrote:
 
  It was my mistake to call it a hash.. it may be just a random id, in hex
  form. Alternatively, use a service-specific sequence number so it's
 better
  suited to humans. In the latter case, the sequence number must
 realistically
  reflect the sequence in which the actions are submitted to units,
 otherwise
  it would be confusing.
 
  On Fri Oct 24 2014 at 3:51:04 PM John Weldon johnweld...@gmail.com
  wrote:
 
  Thanks Gustavo;
 
  I think a hash would be good too.  I'll see what I can find in the juju
  code base around hash representations of id's, or come up with
 something.
  Any suggestions on how to generate and translate the hash are welcome
  too.
 
  Cheers,
 
 
  --
  John Weldon
 
  On Fri, Oct 24, 2014 at 10:41 AM, Gustavo Niemeyer
  gustavo.nieme...@canonical.com wrote:
 
  The tag (which might be better named internal id) looks like an
  implementation detail which doesn't seem right to expose. I'd suggest
 either
  giving it a proper representation that the user can understand (a
 sequential
  action number, for example), or use a hash. I'd also not use a UUID,
 btw,
  but rather just a unique hash.
 
 
 
  On Fri Oct 24 2014 at 2:55:45 PM John Weldon johnweld...@gmail.com
  wrote:
 
  Hi;
 
  The current actions spec indicates that the actions command line
 should
  return a UUID as the identifier for an action once it's been
 en-queued using
  'juju do action'.
 
  Is there a compelling reason to use UUID's to identify actions,
 versus
  using the string representation of the Tag?
 
 
  A UUID would require a command something like:
juju status action:9e1e5aa0-5b9d-11e4-8ed6-0800200c9a66
 
  which maybe we could shorten to:
juju status action:9e1e5aa0
 
 
 
  I would prefer something like:
juju status action:mysq/0_a_3
 
  which would be the string representation of the actions Tag.
 
 
 
  Is there a compelling reason to use UUID?
 
  Cheers,
 
  --
  John Weldon
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe at:
  https://lists.ubuntu.com/mailman/listinfo/juju-dev
 
 
 
 
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe at:
  https://lists.ubuntu.com/mailman/listinfo/juju-dev
 

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread Gustavo Niemeyer

Both of these assumptions are incorrect. Please do not assume there's a
single person managing an environment, and the fact the sequence is
generated outside of the transaction that adds the action is a proof that
actions will be arbitrarily executed rather than in the sequence suggested
by the numbers.

On Fri Oct 24 2014 at 4:21:30 PM John Weldon johnweld...@gmail.com wrote:

 Forgot to reply-all

 -- Forwarded message --
 From: John Weldon johnweld...@gmail.com
 Date: Fri, Oct 24, 2014 at 11:19 AM
 Subject: Re: Actions :: UUID vs. Tag on command line
 To: Gustavo Niemeyer gustavo.nieme...@canonical.com



 On Fri, Oct 24, 2014 at 11:14 AM, Gustavo Niemeyer 
 gustavo.nieme...@canonical.com wrote:

 I doubt this would work. There's no way in the transaction package for
 you to generate an id and reference that same id in other fields in one go.

 In other cases that's not an issue, but having a sequence of numbered
 actions where 10 is applied before 9 would be awkward.



 Interesting.

 1. The sequence is generated in a separate transaction before being used.
 (state/sequence.go)  So I don't think your concern about obtaining and
 using in one transaction will be an issue.
 2. We have not had much discussion around strict ordering of actions being
 run in the order they were queued.  My impression is that two different
 users interacting with the system at the same time is a bit of an edge case.

 --
 John Weldon

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: https://lists.ubuntu.com/
 mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread Gustavo Niemeyer

On Fri Oct 24 2014 at 4:30:38 PM John Weldon johnweld...@gmail.com wrote:

 Ordered execution wasn't addressed in the spec, and we haven't had much
 discussion about it.
 I'm not even sure how to enforce ordered execution unless we rely on the
 creation timestamp.


Specifications are guidelines. If there are open issues in the
specifications, it does not mean that it is okay to do anything in that
sense, but rather than either it should be done in the obviously correct
way, or that a conversation should be raised if the correct way is not
obvious.

If someone sends an action, and then sends another action, to me it's clear
that the first action should be executed before the second action. If the
implementation is not doing that, it should.

If two people send two actions concurrently, by definition there's no order
implied by their use of the system, and so it's impossible to guarantee
which one will be executed first.


 Assuming we have a way to enforce ordered execution, and if that ordering
 is not using the sequence number that is generated, then does exposing that
 sequence number just introduce confusion?


How do you feel about postgres action 103 executing before postgres
action 102?  I personally feel like it's a bug.


 i.e. are we back to just showing some sort of hash / hex sequence as the
 id to avoid implying an order by the sequence number?


Either option sounds fine to me. I'm only suggesting that if you do use
sequence numbers, you're implying a sequence, and people in general are
used to being 35 years old only after they've been 34.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Actions :: UUID vs. Tag on command line

2014-10-24 Thread Gustavo Niemeyer

For 2, it doesn't matter much if the timestamp is taken into account. The
server may simply enqueue the action as it receives it and respond back
only afterwards. This will guarantee read-your-writes consistency, and thus
proper ordering assuming the server does use a queue rather than an
unordered set.

On Fri Oct 24 2014 at 4:44:03 PM John Weldon johnweld...@gmail.com wrote:

 Agreed completely;

 My take away -

 1. Actions en-queued by the same client MUST execute in the order
 en-queued.
 2. Actions en-queued by different clients SHOULD execute in timestamp
 order?
 3. Action IDs should not mislead users by implying sequence that does not
 exist.
 4. ergo Action id's will probably be reflected back to the user in some
 sort of a manageable hash or hex format



 --
 John Weldon

 On Fri, Oct 24, 2014 at 11:38 AM, Gustavo Niemeyer gust...@niemeyer.net
 wrote:



 On Fri Oct 24 2014 at 4:30:38 PM John Weldon johnweld...@gmail.com
 wrote:

 Ordered execution wasn't addressed in the spec, and we haven't had much
 discussion about it.
 I'm not even sure how to enforce ordered execution unless we rely on the
 creation timestamp.


 Specifications are guidelines. If there are open issues in the
 specifications, it does not mean that it is okay to do anything in that
 sense, but rather than either it should be done in the obviously correct
 way, or that a conversation should be raised if the correct way is not
 obvious.

 If someone sends an action, and then sends another action, to me it's
 clear that the first action should be executed before the second action. If
 the implementation is not doing that, it should.

 If two people send two actions concurrently, by definition there's no
 order implied by their use of the system, and so it's impossible to
 guarantee which one will be executed first.


 Assuming we have a way to enforce ordered execution, and if that
 ordering is not using the sequence number that is generated, then does
 exposing that sequence number just introduce confusion?


 How do you feel about postgres action 103 executing before postgres
 action 102?  I personally feel like it's a bug.


 i.e. are we back to just showing some sort of hash / hex sequence as the
 id to avoid implying an order by the sequence number?


 Either option sounds fine to me. I'm only suggesting that if you do use
 sequence numbers, you're implying a sequence, and people in general are
 used to being 35 years old only after they've been 34.



-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Unit Tests Integration Tests

2014-09-11 Thread Gustavo Niemeyer

On Thu, Sep 11, 2014 at 4:06 PM, Mark Ramm-Christensen (Canonical.com)
mark.ramm-christen...@canonical.com wrote:
 But they are not the ONLY reasons why they are valuable.
 There are plenty of others -- performance, test-code cleanliness/re-use,
 result granularity, etc.

Performance is the second reason Roger described, and I disagree that
mocking code is cleaner.. these are two orthogonal properties, and
it's actually pretty easy to have mocked code being extremely
confusing and tightly bound to the implementation. It doesn't _have_
to be like that, but this is not a reason to use it.

 Like any tools, developers can over-use, or mis-use them.   But, if you
 don't use them at all,

That's not what Roger suggested either. A good conversation requires
properly reflecting the position held by participants.

 you often end up with what I call the binary test suite in which one
 coding error somewhere creates massive test failures.

A coding error that creates massive test failures is not a problem, in
my experience using both heavily mocking and heavily non-mocking code
bases. It rarely goes into the repository in the first place, because
it's a massive breakage, and when it does go in due to differences in
environment, it's easy to spot the root of the failure because proper
code is layered.

(...)
 My belief is that you need both small, fast, targeted tests (call them unit
 tests) and large, realistic, full-stack tests (call them integration tests)
 and that we should have infrastructure support for both.

Yep, but that's besides the point being made. You can do unit tests
which are small, fast, and targeted, both with or without mocking, and
without mocking they can be realistic, which is a good thing. If you
haven't had a chance to see tests falsely passing with mocking, that's
a good thing too.. you haven't abused mocking too much yet.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Unit Tests Integration Tests

2014-09-11 Thread Gustavo Niemeyer

On Thu, Sep 11, 2014 at 10:42 PM, Andrew Wilkins
andrew.wilk...@canonical.com wrote:
 I basically agree with everything below, but strongly disagree that mocking
 implies you know exactly what the code is doing internally. A good interface

I'm also in agreement about your points. But just so you understand
where Roger is coming from, the term mocking is often [1] associated
with a test style that does bind very closely to what the code does.
But you're probably using the term more loosely for test doubles in
general, and I'm all for not being pedantic, so yes, +1 to the
intention of what you've said.

[1] http://martinfowler.com/articles/mocksArentStubs.html


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Please don't use bash when there are syscalls available

2014-09-09 Thread Gustavo Niemeyer

Worth keeping in mind the usual gotcha: the API of syscall is
different for different OSes.

On Tue, Sep 9, 2014 at 5:45 PM, Nate Finch nate.fi...@canonical.com wrote:
 A user just complained that he can't bootstrap because Juju is parsing
 stderr text from flock, and his server isn't in English, so the error
 message isn't matching.

 https://github.com/juju/juju/blob/master/environs/sshstorage/storage.go#L254

 Now, I think we all know that parsing error text is a bad idea, but I think
 I understand why it was done - it looks like flock the application only
 returns 1 on this failure, so it's not exactly a unique error code.
 However, flock the system call returns several different error codes, which
 are quite unique and easy to handle in a way that is not dependent on the
 language of the machine.

 It also happens to be already implemented in the syscalls package:

 http://golang.org/pkg/syscall/#Flock

 So let's fix this, and try not to call out to bash unless there's
 absolutely no other way.

 -Nate

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Commented-out tests?

2014-08-29 Thread Gustavo Niemeyer

On Fri, Aug 29, 2014 at 4:28 PM, Katherine Cox-Buday
katherine.cox-bu...@canonical.com wrote:
 Hey all,

 I ran into some commented out tests while making a change:
 https://github.com/juju/juju/pull/630/files#r16874739

 I deleted them since keeping things around that we might need later is the
 job of source control, not comments ;)

If it was a relevant test, removing them generally means they're never
coming back. The best course of action might be to Skip it or to use
ExpectFailure, providing an appropriate reason string. This makes it
visible that there are tests not being run or failing, while still
making sure they at least build.

Of course, if they're indeed never coming back, then just removing
them for good is more honest.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-20 Thread Gustavo Niemeyer

On Wed, Aug 20, 2014 at 5:46 AM, Matthew Williams
matthew.willi...@canonical.com wrote:
 if JUJU_HOOK_NAME == start
  //run start
 else if JUJU_HOOK_NAME == config-changed
  //run config-changed
 else if JUJU_HOOK_NAME == stop
  //run stop
 else
   //unknown hook
   exit 1
 fi

I'd expect the else to be exit 0. This is the same behavior you get
when juju would execute a hook but it does not exist in the charm.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-20 Thread Gustavo Niemeyer

On Wed, Aug 20, 2014 at 11:08 AM, William Reade
william.re...@canonical.com wrote:
 On Wed, Aug 20, 2014 at 10:46 AM, Matthew Williams
 matthew.willi...@canonical.com wrote:
 Gustavo's observation about hooks that the charm might no know about yet
 means that the else clause is absolutely required, I wonder if that's
 obvious to someone who's new to charming?


 I'm pretty much adamant that we shouldn't even run new hooks, or expose new
 tools, unless the charm explicitly declares it knows about them. But I do
 imagine that many implementations will want the else anyway: they don't need
 to provide an implementation for every single hook anyway.

But we're talking about default-hook, which is supposed to run when
things are missing?  Actually, we should probably call this
missing-hook as originally suggested, to make it more obvious that
this is being called because some arbitrary hook was not found. It'll
probably convey the importance handling unknowns in a sane way more
clearly.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-20 Thread Gustavo Niemeyer

On Wed, Aug 20, 2014 at 11:16 AM, Nate Finch nate.fi...@canonical.com wrote:
 Anyone who has ever written a switch statement should be used to putting in
 a default clause for something I don't expect... I don't think it should
 be a big deal.

Some charms mentioned in this thread miss the switch altogether. Given
the conversation so far, it doesn't feel like we really understand how
people are organizing their charms today, nor how they are supposed to
be using the missing-hook. For example, you said in the opening
message that Many charms these days only contain one real hook
script, and the rest are all just symlinks to the real one., and I'm
yet to see a charm with *one* hook alone.

Marco had a noble offer that we should accept:

The majority, if not all, of charms that currently implement this
pattern do so by either using charm-helpers or by having a giant
if/else case statement at the bottom of the hook which maps which code
should execute with each hook that has invoked the symlink'd file. I
can take a survey of current charms which use symlinks to see if any
don't fit this pattern.

Yes, it would be good to have proper data on what charms are doing
today, and how they are supposed to work in that new world. It would
also be good to understand what using charm-helpers means. The
charms discussed above would _not_ work well with a missing-hook
implementation that dispatched on every hook. They would have to be
adapted to it.

Multiple people also mentioned in this thread that maybe it should not
dispatch on all hooks. What does that mean? Which hooks would it
dispatch on, and where is the line? Why?

On Wed, Aug 20, 2014 at 11:50 AM, Nate Finch nate.fi...@canonical.com wrote:
 I would expect a lot of people will implement their charms as a single
 script (especially given the number of charms we've seen implemented that
 way even with minimal support for it).  If the special hook file is called
 default-hook, it makes those single-script charms seem like less of a hack
 than if the single file is called missing-hook.  It would also makes more
 sense to a new charm author, I think.

It's not a hack.. it's subtle, and that's the reason why it should be
called missing-hook. It _is_ subtle. People must be aware that there
is a multitude of events dispatched to that one executable,
potentially with events they do not expect, and they must be aware
that by creating a different hook they will prevent that one
executable from receiving that event. That's what missing-hook
conveys to me. If you think that's too subtle, maybe we need a
different proposal.

 One possibility is to give the charm author the ability to specify the name
 of the default/missing hook file in the charm metadata... this could serve

You mean the same way we have a configuration file in Go that defines
how we want our main() function to be called?  How reasonable does
that feel?


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-20 Thread Gustavo Niemeyer

On Wed, Aug 20, 2014 at 3:45 PM, Nate Finch nate.fi...@canonical.com wrote:
 Here's a proposal that is much simpler: we add a flag to the charm metadata,
 called something like single_hook.  When single_hook is true, all hook
 events run a file called default-hook (or whatever we want to call it, I
 don't really care).  $JUJU_HOOK_NAME will be set with the name of the hook
 that is running.  That's it.  What the charm authors do after the hook file
 gets run is up to them.

That sounds reasonable. We could make both the hook name and the charm
metadata flag be single-hook.

But does it solve people's problems?  Would people that today use half
of the hooks symlinked and half of them without symlinks transition to
that model, or is symlinking more convenient?  What about people using
charm helpers without a dispatch table, such as the case Aaron raised
in this thread?  Their charms would be broken (or will eventually be
broken) without a dispatch table. Would they transition or would they
stick to current practices?

 In the bug's comments, there's discussion about a lack of discoverability
 for what hooks the charm has... but honestly, if you need to know what the
 hooks do, you have to read the code anyway. Hopefully knowing what hooks a
 charm has shouldn't be necessary to use the charm (if using Juju requires
 you to read a charm's code... we're doing something wrong).

We're also doing something wrong if knowing what a hook is supposed to
do requires reading the code.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-20 Thread Gustavo Niemeyer

On Wed, Aug 20, 2014 at 5:05 PM, Nate Finch nate.fi...@canonical.com wrote:
 I think to answer most of these questions, we need more information about
 what the existing charms do, and input from the charmers themselves.

 Here's the info from Marco: http://pastebin.ubuntu.com/8100649/

Thanks. Looking at some entries from that list I can definitely see
how single-hook would be useful, and it looks like it would also work
well with the defined semantics.

 Numbers:

 56/162 charms use symlinks
 6 of those are only partially symlinked
 50 of those use symlinks for all hooks

Given those numbers, and the pattern described above, I'd definitely
try to have the enforced single hook model you described last, which
must be explicitly enabled to work, and where everything is run only
through it when it is indeed enabled. Easier to implement, and to
understand as well.

Addressing Aaron's remark, the hook might be called dispatch so it
that conveys the intended semantics rather than its uniqueness, and
the metadata flag dispatch-hook: bool.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-19 Thread Gustavo Niemeyer

On Tue, Aug 19, 2014 at 9:07 AM, William Reade
william.re...@canonical.com wrote:
 On Mon, Aug 18, 2014 at 9:33 PM, Gustavo Niemeyer gust...@niemeyer.net
 wrote:

 I don't think I fully understand the proposal there. To have such a
 something-changed hook, we ought to have a better mechanism to tell
 *what* actually changed. In other words, we have a number of hooks
 that imply a state transition or a specific notification (install,
 start, config-changed, leader-elected coming, etc). Simply
 calling out the charm saying stuff changed feels like a bad
 interface, both in performance terms (we *know* what changed) and in
 user experience (how do people use that!?).

 The issue is that as charms increase in sophistication, they seem to find it
 harder and harder to meaningfully map specific changes onto specific
 actions. Whether or not to react to a change in one relation depends on the
 values of a bunch of other units in other relations, to the extent that any
 individual relation change can have arbitrarily far-reaching consequences,
 and it ends up being easier to simply write something that maps directly
 from complete-available-state to desired-config.

I have never seen myself a single charm that completely ignores all
the action cues to simply re-read the whole state from the ground up,
and we've just heard in this thread people claiming that even the
charms that use a single hook via symlinks still rely on a dispatching
table based on what action is happening, so I'm not ready to accept
that claim at face value without some actual data.

What percentage of the charms we have completely ignore the actions
that are taking place when making decisions?

   * leader-deposed will completely lack hook tools: we can't run a
 default-hook there unless we know for sure that the implementation doesn't
 depend on any hook tools (in general, this is unlikely).

Why?  People can still run hook tools in leader-deposed, and they will
not work. The situation is no different with default-hook: they are
just two files in the same directory. Run one instead of the other.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-19 Thread Gustavo Niemeyer

On Tue, Aug 19, 2014 at 12:41 PM, William Reade
william.re...@canonical.com wrote:
 (out of interest, if started/stopped state were communicated to you any
 other way, would you still need these?)

If you communicate events in a different way, you obviously won't need
your previous way of communicating events.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-19 Thread Gustavo Niemeyer

On Tue, Aug 19, 2014 at 1:10 PM, Aaron Bentley
aaron.bent...@canonical.com wrote:
 True.  At that point, the pattern is not a win, but it's not much of a
 loss.  Changing the web site relation is extremely uncommon, but
 operations which do require server restarts are quite common.  So
 making an exception for the web site relation can be seen as a
 micro-optimization.

Restarting a process and killing all on-going activity is a big deal
more often than not, for realistic services.

 True, I didn't call out the exceptions for the charmworld charm.
 For completeness, the exceptions in charmworld are:

 Yeah, it definitely depends on knowing the events still.

 On the other hand, it doesn't depend on knowing the events for
 database relation, search engine relation and configuration changes.

The point I was trying to convey is not that you can merge or ignore
certain events. The system was designed so that this was possible in
the first place. The point is rather that the existing event system is
convenient and people rely on it, so I don't buy that a
something-changed hook is what most people want at this point. At
the same time, that's not an argument _against_ it either. If you're
happy with your design, and that'd help you, and William thinks this
can be conveniently implemented, I'm all for making people's lives
easier.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-19 Thread Gustavo Niemeyer

On Tue, Aug 19, 2014 at 6:58 PM, Matthew Williams
matthew.willi...@canonical.com wrote:
 Something to be mindful of is that we will shortly be implementing a new
 hook for metering (likely called collect-metrics). This hook differs
 slightly to the others in that it will be called periodically (e.g. once
 every hour) with the intention of sending metrics for that unit to the state
 server.

 I'm not sure it changes any of the details in this feature or the pr - but I
 thought you should be aware of it

Yeah, that's a good point. I'm wonder how reliable the use of
default-hook will be, as it's supposed to run whenever any given hook
doesn't exist, so charms using that feature should expect _any_ hook
to be called there, even those they don't know about, or that don't
even exist yet. The charms that symlink into a single hook seem to be
symlinking a few things, not everything. It may well turn out that
default-hook will lead to brittle charms.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-18 Thread Gustavo Niemeyer

Rather than passing it as the first argument, I suggest introducing an
environment variable: $JUJU_HOOK_NAME. This would be set irrespective
of how the hook is being called, so that the same hook can be used
both as a symlink and as a default-hook, unchanged. It also means further
spawned processes get a chance to tell the context they're running under.

On Fri, Aug 15, 2014 at 5:36 PM, Nate Finch nate.fi...@canonical.com wrote:
 Just wanted to let people know that Moonstone is ramping up on the customer
 pain points, even ahead of the full spec and prioritization.  I had talked
 to Jorge and Marco about what they thought was important, and they pointed
 out a couple of low hanging fruit.  This was one of them.

 Many charms these days only contain one real hook script, and the rest are
 all just symlinks to the real one.  (because no one wants to write 20
 scripts)  This is kind of a pain in the ass for charm writers, and doesn't
 work well on Windows (Windows symlink support is terrible).  So, why not
 just have a default hook that gets called if the real hook isn't there?
 That's what I implemented today:

 https://github.com/juju/juju/pull/528

 There's new hook in town: default-hook.  If it exists and a hook gets called
 that doesn't have a corresponding hook file, default-hook gets called with
 the name of the original hook as its first argument (arg[1]).

 That's it.

 If/when this PR is accepted, Marco is planning to update charmhelpers to
 make it automatically recognize when the default-hook is called, and get the
 hook name from arg[1] instead of arg[0], so current scripts wouldn't even
 need to change - they'd just need the new charmhelpers, rename the one true
 script to default-hook, and delete all their symlinks.  Bam.

 Moonstone is very excited to be working to make Juju easier for charm
 developers, and we'll see more improvements coming next week.

 -Nate

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: First customer pain point pull request - default-hook

2014-08-18 Thread Gustavo Niemeyer

I don't think I fully understand the proposal there. To have such a
something-changed hook, we ought to have a better mechanism to tell
*what* actually changed. In other words, we have a number of hooks
that imply a state transition or a specific notification (install,
start, config-changed, leader-elected coming, etc). Simply
calling out the charm saying stuff changed feels like a bad
interface, both in performance terms (we *know* what changed) and in
user experience (how do people use that!?).

I understand the underlying problem William is trying to solve but the
current proposal doesn't seem like a complete solution on its own, and
it also seems to change the existing understanding of the model
completely. The proposed default-hooks is a trivial change to the
existing well known workflow.



On Sun, Aug 17, 2014 at 2:30 AM, John Meinel j...@arbash-meinel.com wrote:
 I'd just like to point out that William has thought long and hard about this
 problem, and what semantics make the most sense (does it get called for any
 hook, does it always get called, does it only get called when the hook
 doesn't exist, etc).
 I feel like had some really good decisions on it:
 https://docs.google.com/a/canonical.com/document/d/1V5G6v6WgSoNupCYcRmkPrFKvbfTGjd4DCUZkyUIpLcs/edit#

 default-hook sounds (IMO) like it may run into problems where we do logic
 based on whether a hook exists or not. There are hooks being designed like
 leader-election and address-changed that might have side effects, and
 default-hook should (probably?) not get called for those.

 I'd just like us to make sure that we actually think about (and document)
 what hooks will fall into this, and make sure that it always makes sense to
 rebuild the world on every possible hook (which is how charm writers will be
 implementing default-hook, IMO).

 John
 =:-



 On Sat, Aug 16, 2014 at 1:02 AM, Aaron Bentley aaron.bent...@canonical.com
 wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 14-08-15 04:36 PM, Nate Finch wrote:
  There's new hook in town: default-hook.  If it exists and a hook
  gets called that doesn't have a corresponding hook file,
  default-hook gets called with the name of the original hook as its
  first argument (arg[1]).
 
  That's it.

 Nice!  Thank you.

 Aaron
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1

 iQEcBAEBAgAGBQJT7nVvAAoJEK84cMOcf+9h90UH/RMVabfJp4Ynkueh5XQiS6mD
 TPWwY0FVHfpAWEIbnQTQpnmkhMzSOKIFy0fkkXkEx4jSUt6I+iNYXdu8T77mA38G
 7IZ7HAi+dAzRCrGTIZHsextrs5VpxhdzFJYOxL+TN5VUWYt+U+awSPFn0MlUZfAC
 5aUuV3p3KjlHByLNT7ob3eMzR2mwylP+AS/9UgiojbUOahlff/9y83dYqkCDYzih
 C2rlwf0Wal12svu70ifggGKWcnF/eiwSm4TQjJsfMdCfw0gSg4ICgmIbWQ78OytJ
 AM4UBk1/Ue94dUm3YP+lcgAqJCC9GW5HksCFN74Qr+4xcnuqYoCJJxpU5fBOTls=
 =5YwW
 -END PGP SIGNATURE-

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: getting rid of all-machines.log

2014-08-14 Thread Gustavo Niemeyer

On Thu, Aug 14, 2014 at 1:35 PM, Nate Finch nate.fi...@canonical.com wrote:
 On Thu, Aug 14, 2014 at 12:24 PM, Gustavo Niemeyer
 gustavo.nieme...@canonical.com wrote:

  Why support two things when you can support just one?

 Just to be clear, you really mean why support two existing and well
 known things when I can implement a third thing, right?

 Yes, that is exactly what I mean.

Okay, on that basis and without any better rationale than 12factor
says I can do anything I'd be tempted to request further analysis on
the problem if the decision was on my hands. There are more
interesting problems to solve than redoing what already exists.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: getting rid of all-machines.log

2014-08-14 Thread Gustavo Niemeyer

On Thu, Aug 14, 2014 at 3:14 PM, Nate Finch nate.fi...@canonical.com wrote:
 I didn't bring up 12 factor, it's irrelevant to my argument.

Is there someone else sending messages under your name?

On Thu, Aug 14, 2014 at 12:23 PM, Nate Finch nate.fi...@canonical.com wrote:
 The front page of 12factor.net says offering maximum portability between
 execution environments ... that's exactly what I'm going for.

 I'm trying to make our product simpler and easier to maintain.  That is all.
 If there's another cross-platform solution that we can use, I'd be happy to
 consider it.  We have to change the code to support Windows.  I'd rather the
 diff be +50 -150  than +75 -0.  I don't know how to state it any simpler
 than that.

How about simply allowing people to select their own rsyslog target?


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Gustavo Niemeyer

Ah, and one more thing: when developing the chaos-injection mechanism
in the mgo/txn package, I also added both a chance parameter for
either killing or slowing down a given breakpoint. It sounds like it
would be useful for juju's mechanism too. If you kill every time, it's
hard to tell whether the system would know how to retry properly.
Killing or slowing down just sometimes, or perhaps the first 2 times
out of every 3, for example, would enable the system to recover
itself, and an external agent to ensure it continues to work properly.

On Wed, Aug 13, 2014 at 11:25 AM, Gustavo Niemeyer
gustavo.nieme...@canonical.com wrote:
 That's a nice direction, Menno.

 The main thing that comes to mind is that it sounds quite inconvenient
 to turn the feature on. It may sound otherwise because it's so easy to
 drop files at arbitrary places in our local machines, but when dealing
 with a distributed system that knows how to spawn its own resources
 up, suddenly the just write a file becomes surprisingly boring and
 race prone.

 What about:

 juju inject-failure [--unit=unit] [--service=service] failure name?
 juju deploy [--inject-failure=name] ...



 On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits menno.sm...@canonical.com 
 wrote:
 There's been some discussion recently about adding some feature to Juju to
 allow developers or CI tests to intentionally trigger otherwise hard to
 induce failures in specific parts of Juju. The idea is that sometimes we
 need some kind of failure to happen in a CI test or when manually testing
 but those failures can often be hard to make happen.

 For example, for changes Juju's upgrade mechanics that I'm working on at the
 moment I would like to ensure that an upgrade is cleanly aborted if one of
 the state servers in a HA environment refuses to start the upgrade. This
 logic is well unit tested but there's nothing like seeing it actually work
 in a real environment to build confidence - however, it isn't easy to make a
 state server misbehave in this way.

 To help with this kind of testing scenario, I've created a new top-level
 package called wrench which lets us drop a wrench in the works so to
 speak. It's very simple with one main API which can be called from
 judiciously chosen points in Juju's execution to decide whether some failure
 should be triggered.

 The module looks for files in $jujudatadir/wrench (typically
 /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
 upgrade failure described above I could drop a file in that directory on one
 of the state servers named say machine-agent with the content:

 refuse-upgrade

 Then in some part of jujud's upgrade code there could be a check like:

 if wrench.IsActive(machine-agent, refuse-upgrade) {
  // trigger the failure
 }

 The idea is this check would be left in the code to aid CI tests and future
 manual tests.

 You can see the incomplete wrench package here:
 https://github.com/juju/juju/pull/508

 There are a few issues to nut out.

 1. It needs to be difficult/impossible for someone to accidentally or
 maliciously activate this feature, especially in production environments. I
 have almost finished (but not pushed to Github) some changes to the wrench
 package which make it strict about the ownership and permissions on the
 wrench files. This should make it harder for the wrong person to drop files
 in to the wrench directory.

 The idea has also been floated to only enable this functionality in
 non-stable builds. This certainly gives a good level of protection but I'm
 slightly wary of this approach because it makes it impossible for CI to take
 advantage of the wrench feature when testing stable release builds. I'm
 happy to be convinced that the benefit is worth the cost.

 Other ideas on how to better handle this are very welcome.

 2. The wrench functionality needs to be disabled during unit test runs
 because we don't want any wrench files a developer may have lying around to
 affect Juju's behaviour during test runs. The wrench package has a global
 on/off switch so I plan on switching it off in BaseSuite's setup or similar.

 3. The name is a bikeshedding magnet :)  Other names that have been bandied
 about for this feature are chaos and spanner. I don't care too much so
 if there's a strong consensus for another name let's use that. I chose
 wrench over spanner because I believe that's the more common usage in
 the US and because Spanner is a DB from Google. Let's not get carried
 away...

 All comments, ideas and concerns welcome.

 - Menno



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


 --
 gustavo @ http://niemeyer.net



-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Port ranges - restricting opening and closing ranges

2014-08-06 Thread Gustavo Niemeyer

Agreed, but I also agree that the error on split ranges is a good
simplification to get an implementation in place, and it also doesn't
sound super useful, so it sounds okay to fail to begin with. The other
cases are easy to handle, though.

On Wed, Aug 6, 2014 at 8:26 AM, Kapil Thangavelu
kapil.thangav...@canonical.com wrote:
 agreed. to be clear .. imo, close-port shouldn't error unless there's a type
 mismatch on inputs. ie none of the posited scenarios in this thread should
 result in an error.
 -k



 On Tue, Aug 5, 2014 at 8:34 PM, Gustavo Niemeyer gust...@niemeyer.net
 wrote:

 On Tue, Aug 5, 2014 at 4:18 PM, roger peppe rogpe...@gmail.com wrote:
  close ports 80-110 - error (mismatched port range?)

 I'd expect ports to be closed here, and also on 0-65536.


 gustavo @ http://niemeyer.net

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Port ranges - restricting opening and closing ranges

2014-08-06 Thread Gustavo Niemeyer

How many port ranges are typically made available? One.. Two? Sounds like a
trivial problem.

In terms of concurrency, there are issues either way. Someone can open a
port while it is being closed, and whether that works or not depends purely
on timing.

gustavo @ http://niemeyer.net
On Aug 6, 2014 9:41 AM, roger peppe roger.pe...@canonical.com wrote:

 On 5 August 2014 19:34, Gustavo Niemeyer gust...@niemeyer.net wrote:
  On Tue, Aug 5, 2014 at 4:18 PM, roger peppe rogpe...@gmail.com wrote:
  close ports 80-110 - error (mismatched port range?)
 
  I'd expect ports to be closed here, and also on 0-65536.

 I'm not sure. An advantage of requiring that exactly the
 same ports must be closed as were opened, you can use the port range
 as a key, which makes for a very simple (and trivially concurrent-safe)
 implementation in a mongo collection.

 I'd suggest that this compromise is worth it. We could always make an
 initial
 special case for 0-65535 too, if desired.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Port ranges - restricting opening and closing ranges

2014-08-06 Thread Gustavo Niemeyer

Why would any application well designed open thousands of ports
individually rather than a range? Sounds like an unreasonable use case.

I also don't get your point about concurrency. You don't seem to have
addressed the point I brought up that opening or closing ports concurrently
today already presents undefined behavior.

gustavo @ http://niemeyer.net
On Aug 6, 2014 2:53 PM, roger peppe roger.pe...@canonical.com wrote:

 On 6 August 2014 10:32, Gustavo Niemeyer gust...@niemeyer.net wrote:
  How many port ranges are typically made available? One.. Two? Sounds
 like a
  trivial problem.

 Some applications might open thousands of individual ports.
 It would be nice if it worked well in that case too.

  In terms of concurrency, there are issues either way. Someone can open a
  port while it is being closed, and whether that works or not depends
 purely
  on timing.

 When we've got several units sharing a port space, we'll want to
 keep a unique owner for each port range. That's trivial if the
 reference can be keyed by the port range, but not
 as straightforward if the lookup is two-phase.

 What we don't want is two units in the same machine to be
 able to have the same port open at the same time. I suppose
 we could rely on the fact that hooks do not execute simultaneously,
 but it would be preferable in my view to keep those
 concerns separate.

 In my view, always close the range you've opened is an easy
 to explain rule, and makes quite a few things simpler,
 without being overly restrictive.

  gustavo @ http://niemeyer.net
 
  On Aug 6, 2014 9:41 AM, roger peppe roger.pe...@canonical.com wrote:
 
  On 5 August 2014 19:34, Gustavo Niemeyer gust...@niemeyer.net wrote:
   On Tue, Aug 5, 2014 at 4:18 PM, roger peppe rogpe...@gmail.com
 wrote:
   close ports 80-110 - error (mismatched port range?)
  
   I'd expect ports to be closed here, and also on 0-65536.
 
  I'm not sure. An advantage of requiring that exactly the
  same ports must be closed as were opened, you can use the port range
  as a key, which makes for a very simple (and trivially concurrent-safe)
  implementation in a mongo collection.
 
  I'd suggest that this compromise is worth it. We could always make an
  initial
  special case for 0-65535 too, if desired.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Port ranges - restricting opening and closing ranges

2014-08-06 Thread Gustavo Niemeyer

gustavo @ http://niemeyer.net
On Aug 6, 2014 3:03 PM, roger peppe roger.pe...@canonical.com wrote:

 On 6 August 2014 13:57, Gustavo Niemeyer gust...@niemeyer.net wrote:
  Why would any application well designed open thousands of ports
individually
  rather than a range? Sounds like an unreasonable use case.

 I don't know.

Ok. So let's please move on. I don't see the complexity of listing a few
things (even if it is a thousand) and removing them. It's certainly much
better than removing a thousand ports individually.

  I also don't get your point about concurrency. You don't seem to have
  addressed the point I brought up that opening or closing ports
concurrently
  today already presents undefined behavior.

 The result is undefined for a unit (a port open can fail if another
 one already has
 the port open)

Again, let's not argue anymore then. There's no real problem being created
or solved either way.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Port ranges - restricting opening and closing ranges

2014-08-05 Thread Gustavo Niemeyer

On Tue, Aug 5, 2014 at 4:18 PM, roger peppe rogpe...@gmail.com wrote:
 close ports 80-110 - error (mismatched port range?)

I'd expect ports to be closed here, and also on 0-65536.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: help please: mongo/mgo panic

2014-07-30 Thread Gustavo Niemeyer

Alright, the guess last night was correct, and the candidate fix as
well. I've managed to reproduce the problem by stressing out the
scenario described with 4 concurrent runners running the following two
operations, meanwhile the chaos mechanism injects random slowdowns in
various critical points:

[]txn.Op{{
C: accounts,
Id: 0,
Update: M{$inc: M{balance: 1}},
}, {
C: accounts,
Id: 1,
Update: M{$inc: M{balance: 1}},
}}

To reach the bug, the stress test also has to run half of the
transactions in this order, and the other half with these same
operations in the opposite order, so that dependency cycles are
created between the transactions. Note that the txn package guarantees
that operations are always executed in the order provided in the
transaction.

The fix and the complete test is available in this change:

https://github.com/go-mgo/mgo/commit/3bc3ddaa

The numbers there are lower to run in a reasonable mount of time, but
to give some confidence on the fix and the code in general, I've run
this test for 100k transactions being concurrently executed with no
problems.

Also, to give a better perspective of the sort of outcome that the
logic for concurrent runners produce, this output was generated by
that test while running for 100 transactions:

http://paste.ubuntu.com/7906618/

The tokens like a) in these lines are the unique identifier for a
given transaction runner. Note how every single operation is executed
in precise lock-step despite the concurrency and the ordering issues,
even assigning the same revision to both documents since they were
created together.

Also, perhaps most interestingly, note the occurrences such as:

[LOG] 0:00.180 b) Applying 53d92a4bca654539e703_7791e1dc op 0
(update) on {accounts 0} with txn-revno 2: DONE
[LOG] 0:00.186 d) Applying 53d92a4bca654539e703_7791e1dc op 1
(update) on {accounts 1} with txn-revno 2: DONE

Note the first one is b) while the second one is d), which means
there are two completely independent runners, in different goroutines
(might as well be different machines), collaborating towards the
completion of a single transaction.

So, I believe this is sorted. Please let me know how it goes there.



On Wed, Jul 30, 2014 at 4:14 AM, Gustavo Niemeyer
gustavo.nieme...@canonical.com wrote:
 Okay, I couldn't resist investigating a bit. I've been looking at the
 database dump from earlier today and it's smelling like a simpler bug
 in the txn package, and I might have found the cause already.

 Here is a quick walkthrough while debugging the problem, to also serve
 as future aid in similar quests.

 Enabling full debug for the txn package with SetDebug and SetLogger,
 and doing a ResumeAll to flush all pending transactions, we can
 quickly get to the affected document and transaction:

 2014/07/30 02:19:23 Resuming all unfinished transactions
 2014/07/30 02:19:23 Resuming 53d6057930009a01ba0002e7 from prepared
 2014/07/30 02:19:23 a) Processing 53d6057930009a01ba0002e7_dcdbc894
 2014/07/30 02:19:23 a) Rescanning 53d6057930009a01ba0002e7_dcdbc894
 2014/07/30 02:19:23 a) Rescanned queue with
 53d6057930009a01ba0002e7_dcdbc894: has prereqs, not forced
 2014/07/30 02:19:23 a) Rescanning 53d6057930009a01ba0002eb_98124806
 2014/07/30 02:19:23 a) Rescanned queue with
 53d6057930009a01ba0002eb_98124806: has prereqs, not forced
 2014/07/30 02:19:23 a) Rescanning 53d6057930009a01ba0002ee_a83bd775
 2014/07/30 02:19:23 a) Rescanned document {services ntp} misses
 53d6057930009a01ba0002ee_a83bd775 in queue:
 [53d6057930009a01ba0002eb_98124806 53d6057930009a01ba0002ea_4ca6ed41
 53
 d6057c30009a01ba0002fd_4d8d9123 53d6057e30009a01ba000301_ba0b61dd
 53d6057e30009a01ba000303_a26cb429]
 2014/07/30 02:19:23 a) Reloaded 53d6057930009a01ba0002ee_a83bd775: prepared
 panic: rescanned document misses transaction in queue

 So this error actually means something slightly different from what I
 pointed out in the bug.

 The transaction runner state machine creates transactions in the
 preparing state, and then moves it over to prepared when all
 affected documents were tagged with the transaction id+nonce. So what
 this means is that there is a transaction in progress in the prepared
 state, while the actual document misses the id in its local queue,
 which is an impossible situation unless the document was fiddled with,
 there was corruption, or a bug in the code.

 So, let's have a look at the affected documents. First, the document
 being changed:

 db.services.findOne({_id: ntp})
 http://paste.ubuntu.com/7902134/

 We can see a few transactions in the queue, but the one raising the
 issue is not there as reported by the error.

 And this is full transaction raised by the error:

 db.txns.findOne({_id: ObjectId(53d6057930009a01ba0002ee)})
 http://paste.ubuntu.com/7902095/

 One interesting thing we can do from here is verifying

Re: help please: mongo/mgo panic

2014-07-29 Thread Gustavo Niemeyer

We've got a database dump yesterday, which gives me something to
investigate. I'll spend some time on this tomorrow (today) and report back.

On Wed, Jul 30, 2014 at 1:34 AM, Menno Smits menno.sm...@canonical.com wrote:
 All,

 Various people have been seeing the machine agents panic with the following
 message:

panic: rescanned document misses transaction in queue

 The error message comes from mgo but the actual cause is unknown. There's
 plenty of detail in the comments for the LP bug that's tracking this. If you
 have any ideas about a possible cause or how to debug this further please
 weigh in.

 https://bugs.launchpad.net/juju-core/+bug/1318366

 Thanks,
 Menno

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Mongo experts - help need please

2014-07-25 Thread Gustavo Niemeyer

On Fri, Jul 25, 2014 at 2:37 AM, Ian Booth ian.bo...@canonical.com wrote:
 The tests passed for me every time also, with and without independent 
 sessions.
 If I loaded my machine to max out CPU usage to 100%, then the tests (different
 ones each run) would fail intermittently but reproducibly every time with
 session copy, but I could not induce even one failure without session copying.

As I mentioned, it sounds like a concurrency or timing issue, which
isn't really surprising given that the code at hand is indeed time
sensitive, and that session.Copy will alter significantly the timing
characteristics of the test.

This is at the top of the test file:

// worstCase is used for timeouts when timing out
// will fail the test. Raising this value should
// not affect the overall running time of the tests
// unless they fail.
worstCase = testing.LongWait

// justLongEnough is used for timeouts that
// are expected to happen for a test to complete
// successfully. Reducing this value will make
// the tests run faster at the expense of making them
// fail more often on heavily loaded or slow hardware.
justLongEnough = testing.ShortWait

// fastPeriod specifies the period of the watcher for
// tests where the timing is not critical.
fastPeriod = 10 * time.Millisecond

// slowPeriod specifies the period of the watcher
// for tests where the timing is important.
slowPeriod = 1 * time.Second


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Mongo experts - help need please

2014-07-25 Thread Gustavo Niemeyer

On Fri, Jul 25, 2014 at 5:29 AM, Stuart Bishop
stuart.bis...@canonical.com wrote:
 On 25 July 2014 12:05, Gustavo Niemeyer gustavo.nieme...@canonical.com 
 wrote:
 The bug Ian cites and is trying to work around has sessions failing
 with an i/o error after some time (I'm guessing resource starvation in
 MongoDB or TCP networking issues). session.Copy() is pulling things
 from a pool, so it might be handing out sessions doomed to fail with
 exactly the same issue. The connections in the pool could even be
 perfectly functional when they went in, with no way at the go level of
 knowing they have failed without trying them.

That's not actually the bug Ian is asking information about in this thread.

The reason why the timeouts happen is well understood: MongoDB has a
fixed timeout of 10 minutes, and mgo right now does not concurrently
ping a socket that was reserved for a session. Using a single session
forever and never calling Refresh on it will surely timeout if it
stays unused for that long.

The solution is simple: call Refresh at a control point (where that is
depends on the application shape) or Close a copy of the session and
let the pool internally deal with it, and do handle any errors when
they happen.

 If this is the case, then Ian would need to handle the failure by
 ensuring the failed connection does not go back in the pool and
 grabbing a new one (the defered Close() will return it I think). And
 repeating until it works, or until the pool has been exhausted and we
 know Mongo is actually down rather than just having a polluted pool.

There's no reason to do that. The pool can deal with connection errors
and timeouts, and collects bad sockets appropriately.

Trying to ensure a bad socket never comes out of the pool is also a
bad path. It's impossible to guarantee that a socket obtained from mgo
or any other database driver is indeed in perfect state. Failures can
happen the nanosecond after any tests are made. The reliable way is to
handle errors appropriately, fallback to a sane path, and retry from
there.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Mongo experts - help need please

2014-07-24 Thread Gustavo Niemeyer

On Fri, Jul 25, 2014 at 1:02 AM, Ian Booth ian.bo...@canonical.com wrote:
 We've transitioned to using Session.Copy() to address the situation whereby 
 Juju
 would create a mongo collection instance and then continue to make db calls
 against that collection without realising the underlying socket may have 
 become
 disconnected. This resulted in Juju components failing, logging i/o timeout
 errors talking to mongo, even though mongo itself was still up and running.

Sounds sane, as I indicated in previous discussions about the topic in
these last two weeks and also about a year ago when we covered that.
Serializing every single request to a concurrent server via a single
database connection seems like a pretty bad idea for anything but
simplistic servers.

 As an aside - I'm wondering whether the mgo driver shouldn't transparently 
 catch
 an i/o error associated with a dead socket and retry using a fresh connection
 rather than imposing that responsibility on the caller?

The evidence so far indicates that this will likely not happen. The
current design was purposefully put in place so that harsh connection
errors are not swept under the rug, and this seems to be working well
so far. I'd rather not have juju proceeding over a harsh problem such
as a master re-election midway through the execution of an algorithm
without any indication that the failure has happened, let alone
silently retry operations that in most cases are not idempotent.

That said, the goal is of course not to make the developer's life
miserable. All the driver wants is an acknowledgement that the error
was perceived and taken care of. This is done trivially by calling:

session.Refresh()

Done. The driver will happily drop the error notice, and proceed with
further operations, blocking if waiting for a re-election to take
place is necessary.

That said, as stated above using a single session for _everything_
might not be a good idea for other reasons.

(...)
 If session.Copy() doesn't work here, what's the approach to use to ensure the
 watcher just doesn't become dead because the underlying socket dies? Or how 
 can
 we make the session.Copy() approach work always even when the host machine is
 under high load? Or maybe watcher code is fine and the tests are wrong?

This feels very much like a concurrency or timing issue. You might
also be misunderstanding what session.Copy does.. it's not so magic.
If session.Copy truly prevented the watcher from working, it wouldn't
work at all either way. Every independent process that connects to the
database and does a change is monitored by watchers that live in
different sessions.

 The tests are quite simple:

I'm not able to observe the test failure you mention after hacking it
to use independent sessions:

http://paste.ubuntu.com/7852418/


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: series-agnostic charm URLs

2014-07-23 Thread Gustavo Niemeyer

On Wed, Jul 23, 2014 at 7:35 AM, roger peppe rogpe...@gmail.com wrote:
 We want to store charm URLs in mongo-db that are agnostic whether
 the series is specified or not. For example, in a bundle, a service
 is free to specify a series in the charm name or not.

That sounds slightly surprising. How do we plan to define what the
bundle actually means?

While having one or two types to represent the concept may be argued
back and forth, there's an underlying concept that is important: one
form is a loose wildcard that has to be resolved depending on context
before being useful, and was originally designed to be used in command
lines and the such, while the other is a more formal specification
(must have a schema, must have series). Accepting the loosely defined
form in a bundle seems surprising, even if it just means not having a
series, given that deploying the bundle would hopefully be somewhat
deterministic in terms of which distributions are being used.

 I'd like to suggest that we remove the Reference type and use the URL
 type throughout, allowing it to have an unspecified series
 where the string form does not specify a series.

 This means that the URL type would be an exact reflection of the string
 form of a charm URL.

As noted above, a Reference may not have a schema as well, so this
suggestion seems to imply that foo becomes a valid URL. Maybe having
just URL could be made cleaner, though. This should be judged based on
a more detailed proposal.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: series-agnostic charm URLs

2014-07-23 Thread Gustavo Niemeyer

On Wed, Jul 23, 2014 at 9:13 AM, Richard Harding
rick.hard...@canonical.com wrote:
 This is driven by requirements from ecosystem and users where bundles
 define a 'solution'. A mongodb-cluster bundle doesn't need to be updated
 every time a new revision comes out, or even if a new series comes out. It
 is a usable solution regardless. Bundles can be as specific as they wish to
 be, however requiring them to define charms specifically reduces their
 reusability and causes us to be less flexible.

When you design a system there's always a tension between what people
need and what they think they need. Speaking of a different area close
to our hearts, programming languages such as Perl evolved with the
author hearing user requests.. developers, even fairly experienced
ones, tend to want to pack as much power on as few key strokes as
possible, and a language that has a very high rate of meaning per key
stroke is often deemed as an expressive and powerful programming
language. That feeling presumes that there is a high cost in typing a
bit more, but as time passes we're learning that the semantic load has
a more relevant cost on itself, and simpler but consistent primitives
often yield better results.

Going back to bundles, not having to update a bundle when a new,
entirely different, release of Ubuntu comes out, is of course much
more expressive, and people love expression, but carries with it a
relevant semantic load. It also means neither we nor anybody else has
any idea about what people actually get when they deploy a bundle, and
whether the bundle will even work tomorrow once a new major upgrade is
pushed to the repository. Our focus should not be to encourage that,
but to help people express what they mean clearly and easily. If they
want a new release of the bundle with a slightly different meaning,
that should be trivial, but it should not be trivial to express lack
of clarity.

 We also have to worry about historical usage as we've always supported the
 vague behaviour and many of the current of bundles take advantage of it.

Yes, bundles were very organically developed. But I won't re-raise that rant.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: series-agnostic charm URLs

2014-07-23 Thread Gustavo Niemeyer

On Wed, Jul 23, 2014 at 9:59 AM, roger peppe roger.pe...@canonical.com wrote:
 The charm URL in a bundle means exactly what it would mean if
 you typed it in a juju deploy command. That is, it is dependent
 on the charms available at bundle deploy time.

I would fix that instead.

 I do believe having just URL would be significantly cleaner.
 What area would you like to see more detail on?

The code review, but it doesn't have to be me judging it.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Enhancing our IRC bot?

2014-07-23 Thread Gustavo Niemeyer

Great timing, Kate.

I was recently asked to take care of mup's deployment again, and I'm
about to put live its third incarnation, reviving a hack I started
back in 2011 to port the ancient Erlang bot I wrote too many years ago
into a Go version. My goal, among other things, is to make plugin
writing a lot easier, so this kind of problem fits well. I'm just
finishing a few details and will send some notes soon.

On Wed, Jul 23, 2014 at 11:55 AM, Katherine Cox-Buday
katherine.cox-bu...@canonical.com wrote:
 Hey all,

 I thought my first post to the list would be something relatively innocuous
 :)

 Have we ever considered enhancing our IRC bot to report CI status? Maybe
 start off with important notifications such as job failures? It might bring
 more attention to the health of trunk, and IRC is already a major
 communication hub.

 Interested in your thoughts!

 -
 Katherine

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Charm store API proposal, new version

2014-07-16 Thread Gustavo Niemeyer

On Tue, Jul 15, 2014 at 7:05 PM, Richard Harding
rick.hard...@canonical.com wrote:
 It is listed under known clients in the spec, and we mentioned your request
 down below.  What we lack is your specific use cases as no one working on
 the spec is knowledgeable about how you're using the api.

Besides what others have said, requiring everyone to not only review
their own usage of the existing public APIs, but to justify their
cases in a convincing way, as an attempt to prevent you from breaking
the existing use cases is a pretty bad approach to API compatibility.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: RFC: mongo _id fields in the multi-environment juju server world

2014-07-07 Thread Gustavo Niemeyer

On Mon, Jul 7, 2014 at 10:09 AM, roger peppe roger.pe...@canonical.com wrote:
 I had assumed that because every client needs to see every transaction
 there would likely be no benefit to sharding the log, although
 technically you could shard on transaction id. I'd be

Clients don't need to see every transaction. Only those that affect
the documents they are acting on.

 Thanks for pointing this out. If we manage to hugely scale juju using mongodb
 I will be very happy. I still think we should do some measurements to
 convince us that we actually have some hope of doing so though.
 My own measurements left me less than convinced of the
 possibility, although it's been a while since I did them.

When you measured a sharded setup, what was the outcome?


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: RFC: mongo _id fields in the multi-environment juju server world

2014-07-07 Thread Gustavo Niemeyer

On Mon, Jul 7, 2014 at 2:03 PM, roger peppe roger.pe...@canonical.com wrote:
 The latter might turn out to be quite awkward, though there's
 probably a nice solution I don't see.

 Suppose we've got three environments, A, B and C.

 We have transactions that span {A, B}, {B, C} and {C, A}.

 How can we choose a consistent shard key for all those
 transactions?

What is a consistent shard key and why does it matter?

 Okay, so the measurements that left you unconvinced that sharding
 might help to scale up were not using sharding.

 If we struggle to meet the requirements for a single environment,
 we're unlikely to meet them when we're running several environments
 per shard, which is surely necessary if we're to scale up.

That's unsound reasoning for the context. It implies that to be able
to meet a load demand with many serving machines we must be able to
meet the load demand with a single serving machine. Not true.

 I hope it can work for us.

 I really do.

I do as well.

 I just worry that without actually doing some measurement in advance,
 we may spend a lot of time working on this stuff and find that it was all for
 nought because we're fundamentally bottlenecked somewhere
 we didn't anticipate.

By all means, please do measure and collect as much data as necessary
to have a good design. We won't see any performance improvements
without a reasonable understanding of how the system works and
performs.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: move towards using gopkg.in

2014-07-07 Thread Gustavo Niemeyer

On Mon, Jul 7, 2014 at 6:00 PM, Ian Booth ian.bo...@canonical.com wrote:
 I'm somewhat wary of depending on an another unknown third party website 
 being

That's hilarious. I haven't been pushing for its usage on juju, and
I'm still not the one actively pushing it, but that's a pretty bad
argument to raise here.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: move towards using gopkg.in

2014-07-07 Thread Gustavo Niemeyer

On Mon, Jul 7, 2014 at 7:18 PM, Ian Booth ian.bo...@canonical.com wrote:
 It wasn't mean to be funny. I'm unsure why it's a bad argument. It's quite
 prudent to ensure that critical infrastructure on which our development 
 depends
 meets expectations with regard to uptime, reliability etc (a case in point 
 being
 the recent issue with an out of date certificate or so I was told). Sorry if 
 the
 question caused any offence. I raised the question totally independent of that
 fact that someone within Canonical had set up the site.

You can't both say that it is totally independent from someone next
to you being responsible for it, and that it's about being an
unknown third party.

If your worries are about reliability, there is public track record
with the uptime since it was put online
(http://stats.pingdom.com/r29i3cfl66c0), and that uptime is supported
by replicated deployments across separate cities with automatic
failover.

Any other concerns?


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: move towards using gopkg.in

2014-07-07 Thread Gustavo Niemeyer

On Mon, Jul 7, 2014 at 8:49 PM, David Cheney david.che...@canonical.com wrote:
 I don't want to introduce another thing to break CI, we already pull
 from github which is bad enough, but going via gopkg.in introduces an
 additional point of failure which can further reduce the already
 bullet ridden credibility of our CI.

Again, gopkg.in sits in a reliable deployment, with a provable track record.

 I also don't want to start introducing versioned import paths into
 Juju without serious discussion of how to prevent two different
 versions of a package transitively.

 go list -f '{{range .Deps}}{{printf %s\n .}}{{end}}' | grep gopkg.in
| sort -u | sed 's/\.v[0-9]\+$/\.vN/' | uniq -c | sed '/ 1 /d'

 I am NOT LGTM on any change that introduces gopkg.in redirected import
 paths until the issue above is resolved.

Okay, that's done.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: RFC: mongo _id fields in the multi-environment juju server world

2014-07-04 Thread Gustavo Niemeyer

On Fri, Jul 4, 2014 at 6:01 AM, roger peppe roger.pe...@canonical.com wrote:
 There is another possiblity: we could just use a different collection
 name prefix for each environment. There is no hard limit on the number
 of collections in mongo (see 
 http://docs.mongodb.org/manual/reference/limits/).

For sharding and for good space management in general it's better to
have data in a collection that gets automatically managed by the
cluster. It's also much simpler to deal with in general, even if it
does require code changes to get started.

 - for a small environment, table indexes remain small and lookups fast
 even though the total number of entries might be huge.

Same as above: when it gets _huge_ you need sharding either way, and
it's easier and more efficient to manage a single collection than 10k.

 - each environment could have a separate mongo txn log, so one busy
 environment that's constantly adding transactions will not necessarily
 slow down all the others. There is, in general, no need for sequential
 consistency between
 environments.

With txn there's no sequential consistency even within the same
environment, if you're touching different documents.

 - database isolation between environments is an advantage when things
 go wrong - it's easier to fix or delete individual environments if their
 tables are isolated from one another.

Sure, it prevents bad mistakes caused by not taking the environment id
in consideration, but deleting foo:* is just as easy.

 I suggest that, at the least, taking this approach would be a quick
 road to making the state work with multiple environments. It
 would not preclude a move to changing to use composite keys
 in the future.

We already know it's a bad idea today. Let's please not do that mistake.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: RFC: mongo _id fields in the multi-environment juju server world

2014-07-04 Thread Gustavo Niemeyer

On Fri, Jul 4, 2014 at 10:32 AM, roger peppe roger.pe...@canonical.com wrote:
 It won't be possible to shard the transaction log.

Why not?

 The thing I'm trying to get across is: until we know one way or
 another, I believe it would be better to choose the (much) simpler
 option and use the (potential weeks of) dev time for other things.

We know it's a bad idea. Besides everything else I mentioned, there
are _huge_ MongoDB databases out there being that depend on sharding
to scale.. we're talking hundreds of machines. It seems very naive to
go with a model that loses the benefits of all the lessons the MongoDB
development team learned with those use cases, and the work they have
done to support them well.

We have been there in Canonical. Ask folks about the CouchDB story.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Port ranges - restricting opening and closing ranges

2014-06-26 Thread Gustavo Niemeyer

+1 to Mark's point. Handling exact matches is much easier, and does
not prevent a fancier feature later, if there's ever the need.

On Thu, Jun 26, 2014 at 3:38 PM, Mark Ramm-Christensen (Canonical.com)
mark.ramm-christen...@canonical.com wrote:
 My belief is that as long as the error messages are clear, and it is easy to
 close 8000-9000 and then open 8000-8499 and 8600-9000, we are fine.Of
 course it is nicer if we can do that automatically for you, but I don't
 see why we can't add that later, and I think there is a value in keeping a
 port-range as an atomic data-object either way.

 --Mark Ramm


 On Thu, Jun 26, 2014 at 2:11 PM, Domas Monkus domas.mon...@canonical.com
 wrote:

 Hi,
 me and Matthew Williams are working on support for port ranges in juju.
 There is one question that the networking model document does not answer
 explicitly and the simplicity (or complexity) of the implementation depends
 greatly on that.

 Should we only allow units to close exactly the same port ranges that they
 have opened? That is, if a unit opens the port range [8000-9000], can it
 later close ports [8500-8600], effectively splitting the previously opened
 port range in half?

 Domas

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Thoughts to keep in mind for Code Review

2014-06-25 Thread Gustavo Niemeyer

Agreed, but for a slightly different reason. The suggestion is to
annotate the patch with the reason for the change, rather than the
code itself, which might indeed lead to a different kind of comment.
While this might be useful, one of the interesting outcomes of code
reviewing is that it forces the final logic to go through different
eyes and mindsets. The I don't get it is not always a bad thing in a
review.. it's rather the reason why simplifications and entirely
different approaches are suggested. Many times I consciously avoid
reading an on-going discussion in the review before doing my own
review, precisely so I can get a fresh perspective on the code before
getting to know everyone else's. Then, with inline reviewing saying
Please tell me why you did this is very cheap on both ends.


On Wed, Jun 25, 2014 at 1:42 AM, Ian Booth ian.bo...@canonical.com wrote:
 -1 on annotations. If you need to annotate to make it clearer then that should
 be done as code comments so the next poor soul who reads the code has a clue 
 of
 what's been done

 On 25/06/14 14:20, John Meinel wrote:
 An interesting article from IBM:
 http://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/

 There is a pretty strong bias for we found these results and look at how
 our tool makes it easier to follow these guidelines, but the core results
 are actually pretty good.

 I certainly recommend reading it and keeping some of it in mind while
 you're both coding and reviewing. (Particularly how long should code review
 take, and how much code should be put up for review at a time.)
 Another trick that we haven't made much use of is to annotate the code we
 put up for review. We have the summary description, but you can certainly
 put some inline comments on your own proposal if you want to highlight
 areas more clearly.

 John
 =:-




 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Thoughts to keep in mind for Code Review

2014-06-25 Thread Gustavo Niemeyer

Thanks, John. Several nice ideas there. I especially like the data
backing the first few points.. it provides evidence to something we
intuitively understand.

I also wrote some points about this same topic, but from a slightly
different perspective, last year:

http://blog.labix.org/2013/02/06/ethics-for-code-reviewers


On Wed, Jun 25, 2014 at 1:20 AM, John Meinel j...@arbash-meinel.com wrote:
 An interesting article from IBM:
 http://www.ibm.com/developerworks/rational/library/11-proven-practices-for-peer-review/

 There is a pretty strong bias for we found these results and look at how
 our tool makes it easier to follow these guidelines, but the core results
 are actually pretty good.

 I certainly recommend reading it and keeping some of it in mind while you're
 both coding and reviewing. (Particularly how long should code review take,
 and how much code should be put up for review at a time.)
 Another trick that we haven't made much use of is to annotate the code we
 put up for review. We have the summary description, but you can certainly
 put some inline comments on your own proposal if you want to highlight areas
 more clearly.

 John
 =:-

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: This is why we should make go get work on trunk

2014-06-06 Thread Gustavo Niemeyer

go is the default build tool, and the vast majority of go projects work
out of the box with go get. If we cannot make it work, that's fine, but
looking at other projects that cannot get it to work is no excuse. If you
guys can make it work, even if we continue to support godep(s), by all
means do it. Not only it's a better welcome for Go developers, but it also
means these pieces can more easily be used in other projects too, without
having to import the whole build system.


On Fri, Jun 6, 2014 at 6:11 PM, Kapil Thangavelu 
kapil.thangav...@canonical.com wrote:

 just as it fails for many other projects.. etcd, docker, serf, consul,
 etc... most larger projects are going to run afoul of trying to do cowboy
 dependency management and adopt one of the extant tools for managing deps
 and have a non standard install explained to users in its readme, else its
 vendoring its deps.

 -k





 On Fri, Jun 6, 2014 at 5:05 PM, Nate Finch nate.fi...@canonical.com
 wrote:

 (Resending since the list didn't like my screenshots)

 https://twitter.com/beyang/statuses/474979306112704512

 https://github.com/juju/juju/issues/43

 Any tooling that exists for go projects is going to default to doing go
 get.  Developers at all familiar with go, are going to use go get.

 People are going to do

 go get github.com/juju/juju

 and it's going to fail to build, and that's a terrible first impression.

 Yes, we can update the README to tell people to run godeps after running
 go get, and many people are not going to read it until after they get the
 error building.

 Here's my suggestion:

 We make go get work on trunk and still use godeps (or whatever) for
 repeatable builds of release branches.

 There should never be a time when tip of trunk and all dependent repos
 don't build.  This is exceedingly easy to avoid.

 Go crypto (which I believe is what is failing above) is one of the few
 repos we rely on that isn't directly controlled by us.  We should fork it
 so we can control when it updates (since the people maintaining it seem to
 not care about making breaking API changes).
  -Nate

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 

gustavo @ http://niemeyer.net
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: GitHub issues

2014-06-05 Thread Gustavo Niemeyer

The comment was made with the understanding that this was your original
plan, and the point is to measure engagement before closing it down, or
you'll never know whether it makes any difference for juju specifically.
Also,  isn't Launchpad able to track issues originally filed on other
trackers? That used to be one of its big selling points for distro work.

gustavo @ http://niemeyer.net
On Jun 4, 2014 10:43 PM, Ian Booth ian.bo...@canonical.com wrote:

 Actually the original plan was not to enable Github's issue tracker and
 continue
 using Launchpad's. Having 2 issue trackers is not optimal and will create
 too
 much management overhead and wasted effort. We are continuing to use
 Launchpad's
 milestones for scoping and planning releases etc and of course this all
 ties in
 with Launchpad's issue tracker.

 So I'd prefer to stick with the plan and disable Githubs's tracker. This
 was
 meant to be done when the repo was set up.

 On 05/06/14 00:23, Gustavo Niemeyer wrote:
  I would keep them around for a while and try to observe how the
  community reacts to the availability. If people don't care, then just
  closing it sounds fine. If you start to get engagement there, might be
  worth going over the trouble of supporting users that live in that
  ecosystem. My experience has been that I got significantly more
  engagement, including bugs, once moving over projects to github.
 
  On Wed, Jun 4, 2014 at 10:13 AM, Curtis Hovey-Canonical
  cur...@canonical.com wrote:
  On Wed, Jun 4, 2014 at 6:36 AM, Andrew Wilkins
  andrew.wilk...@canonical.com wrote:
 
  What are our options? Is it simplest just to disable GitHub issues,
 and have
  the lander pick up fixes lp:NN and add a comment to the bug in
  Launchpad?
 
  I think this is the easiest path.
 
 
 
  --
  Curtis Hovey
  Canonical Cloud Development and Operations
  http://launchpad.net/~sinzui
 
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev
 
 
 

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: not rebasing after PR?

2014-06-05 Thread Gustavo Niemeyer

FWIW, I pretty much never rebase in my usual development workflow. I'm
surprised to hear it became a norm somehow.

On Thu, Jun 5, 2014 at 2:06 PM, roger peppe rogpe...@gmail.com wrote:
 I'd love to ditch rebasing if it was reasonable to do so.
 It just adds overhead to an already tiresome procedure.


 On 5 June 2014 16:22, Nate Finch nate.fi...@canonical.com wrote:
 I am far from a git expert, but it sounds like we can get a bzr-like
 overview of merges to trunk if we give git the right command. This is from
 the canonical-tech discussion:

 (from Dimitri John Ledkov)
 On Thu, Jun 5, 2014 at 2:26 PM, Ian Booth ian.bo...@canonical.com wrote:
 
(from Nate Finch)
   As for bzr versus git, I honestly don't see much of a difference.  I
   know
   there are things that bzr does better than git, but they're not
   features I
   really ever used, so I don't miss them.
  
 
  What about all the complications, hassle, and extra overhead with the
  need to
  rebase all the time due to git's logging model? There's just no need for
  that in
  bzr so the workflow is *much* simpler [0].
 bzr defaults to showing just the first parent only, but you can see
 all the glory details with $ bzr log -n 0.
 git defaults to glory details, but you can get equivalent to bzr
 default view as well, e.g. compare output of:
 $ git log --oneline --graph --decorate
 with
 $ git log --oneline --graph --decorate --first-parent
 If one consistently merges in, individual branches only, git will
 generate the same graph history as bzr and will be able to present it
 the same way bzr would.


 This sounds like it might solve some of the problems we're worrying about
 that get caused by rebasing, such as losing comments etc.

 It sounds like this might be a usable workflow:

 commit several times to your feature branch.
 rebase into a single commit
 submit pull request
 comment on pull request  commit patches to pull request
 merge pull request as-is (with extra commits after submit)


 This mashes all your pre-PR commits into one, so hides some commit spam that
 way, but then keeps the post-PR commits, to preserve comments.  It sounds
 like we can still get a list of just the merges from git, to exclude all the
 commits during code review.

 This sounds like the best of both worlds (or as close as we can get) and
 removes one more step (rebasing after code review changes), which seems like
 a good thing.

 Thoughts?

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: GitHub issues

2014-06-04 Thread Gustavo Niemeyer

I would keep them around for a while and try to observe how the
community reacts to the availability. If people don't care, then just
closing it sounds fine. If you start to get engagement there, might be
worth going over the trouble of supporting users that live in that
ecosystem. My experience has been that I got significantly more
engagement, including bugs, once moving over projects to github.

On Wed, Jun 4, 2014 at 10:13 AM, Curtis Hovey-Canonical
cur...@canonical.com wrote:
 On Wed, Jun 4, 2014 at 6:36 AM, Andrew Wilkins
 andrew.wilk...@canonical.com wrote:

 What are our options? Is it simplest just to disable GitHub issues, and have
 the lander pick up fixes lp:NN and add a comment to the bug in
 Launchpad?

 I think this is the easiest path.



 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Juju, mongo 2.6 and labix.org/v2/mgo issue

2014-05-28 Thread Gustavo Niemeyer

It's indeed being updated. The frequent sprints haven't been helping, but
I'm hoping to have a new release out next week.

gustavo @ http://niemeyer.net
On May 28, 2014 8:19 AM, Ian Booth ian.bo...@canonical.com wrote:

 Hi all

 I'm testing Juju with Mongo 2.6 to evaluate how that affects our remaining
 intermittent unit test failures.

 I've compiled a copy of Mongo 2.6 and have been able to bootstrap an
 environment
 with no issues. Great so far.

 However, the tests aren't happy. eg the tests in agent/mongo fail as do a
 bunch
 of others.

 It seems Mongo 2.4 - 2.6 has changed he way admin users are created. In
 Juju,
 we have a EnsureAdminUser() function. It does this:

 session.DB(admin).AddUser(p.User, p.Password, false)

 That fails with:

 not authorized for upsert on admin.system.users

 Fine, so the AddUser API doc in the mgo driver says to use UpsertUser for
 mongo
 2.4 or greater:

 session.DB(admin).UpsertUser(
 mgo.User{Username: p.User, Password: p.Password,
 Roles:[]mgo.Role{mgo.RoleUserAdminAny}})

 It still fails the same way.

 So I reverted to calling the createUser command directly as per the Mongo
 2.6 docs:

 session.DB(admin).Run(bson.D{
 {createUser, p.User},
 {pwd, p.Password},
 {roles, []mgo.Role{mgo.RoleUserAdminAny}}},
 nil)

 The above works for the initially failing tests in agent/mongo. I haven't
 re-run
 the entire suite again though. It may be further tweaks are required.

 I can easily continue using the last construct above, but it *seems* that
 the
 mgo driver may need updating? Am I missing something?

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Ensuring tests pass on gccgo

2014-05-22 Thread Gustavo Niemeyer

On Wed, May 21, 2014 at 10:43 PM, Ian Booth ian.bo...@canonical.com wrote:
 We are working to make all juju-core unit tests pass using gccgo. In case you
 didn't already know, there's a common issue which has caused a lot of the
 failures to date. Here's a quick heads up on how to deal with it.

 golang-go and gcc-go have different map implementations which results in
 ordering differences, affecting things like range etc (simplistically put,
 gcc-go's map ordering is random whereas currently golang-go is somewhat
 deterministic).

This is changing in the main compiler as well, in Go 1.3:

http://tip.golang.org/doc/go1.3#map

So it'll become even less deterministic there as well.

 Now of course, maps are unordered but what we sometimes do in
 the code is to use a map to hold some data (maybe to eliminate duplicates) and
 then expose that data via a slice or array. If we then do a c.Assert(v1,
 gc.DeepEquals, v2), it will fail on gcc-go, since the order of items in the 2
 slices is different, even though the values are the same.

If that's really the case, it's definitely a bug in gccgo. gocheck's
DeepEquals is implemented in terms of reflect.DeepEqual, which should
not care about the map order. In the standard library of the main
compiler, it clearly does not:

for _, k := range v1.MapKeys() {
if !deepValueEqual(v1.MapIndex(k), v2.MapIndex(k),
visited, depth+1) {
return false
}
}

So gocheck's DeepEquals is fine for such map tests, assuming no bugs
in the underlying implementation.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Implementing Juju Actions

2014-03-27 Thread Gustavo Niemeyer

On Thu, Mar 27, 2014 at 12:05 PM, James Solomon binary...@gmail.com wrote:
 I'd like to clarify what I'm understanding here: we are to implement the new
 commands alongside deploy and set as verbs belonging to the Charm code.
 And these commands are implemented separately from the /cmd code tree (I
 guess the Command and RunCommand interfaces are for the juju run code
 discussed above.)

That's almost right. It does need something analogous to the set
command, and that is in fact sitting right next to the set
configuration command. This is the do command in juju do ..., and
is not a verb belonging to the charm code. In addition to that, it
needs action-get and action-set commands, analogous to config-get and
config-set, and that is available to the charm hooks.

 That's surprising, FWIW -- on that side note, one scalable alternative to
 parallel SSH for remote exec is ZeroMQ, which is really effective in

We already have a comprehensive mechanism to distribute requests to
the unit agents. The main surprise is that it's not being used in this
case. That said, if we are to discuss this, let's please start a new
thread as this is a completely independent subject.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: arresting bootstrap teardown

2014-03-24 Thread Gustavo Niemeyer

How about --keep-on-error?

On Mon, Mar 24, 2014 at 3:00 PM, roger peppe rogpe...@gmail.com wrote:
 If anyone, like me, has been frustrated when debugging
 bootstrap failures and having the bootstrap
 machine torn down immediately on failure,
 a quick and relatively easy workaround for that
 is to kill -STOP the juju bootstrap process
 while it's doing the ssh commands.

 You'll continue to see the ssh commands execute,
 but the parent process will stop when they finish,
 allowing you time to ssh into the bootstrap machine
 and inspect it.

 kill -CONT to allow the process to complete its cleanup.

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Upcoming import change for loggo

2014-03-05 Thread Gustavo Niemeyer

On Wed, Mar 5, 2014 at 5:12 PM, Nate Finch nate.fi...@canonical.com wrote:
 For the record, I'm not a fan of duplicating the package name of anything in
 the standard library.   Obviously, sometimes collisions will happen if a new
 package is added to the standard library, but it seems like a bad idea to do
 it on purpose.  When you're deep in the middle of a file, and you see

 log.Printf()

That looks like a pretty interesting example of when a matching
package name *is* a good idea. If I was able to just switch a logging
package and be able to have things working seamlessly, I'd love it.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Go Style Guide

2014-02-20 Thread Gustavo Niemeyer

On Thu, Feb 20, 2014 at 5:31 PM, Nate Finch nate.fi...@canonical.com wrote:
 One thing that I thought was very interesting was using import dot to get
 around circular references for tests.  I actually hit this exact problem
 just yesterday.

 https://code.google.com/p/go-wiki/wiki/Style#Import_Dot

I prefer to import the package by its own name, even when there are no
circular dependencies.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Go Style Guide

2014-02-20 Thread Gustavo Niemeyer

On Thu, Feb 20, 2014 at 6:00 PM, Nate Finch nate.fi...@canonical.com wrote:
 Well, nevermind.  That's just terrible.  It's just black box testing the
 same as any external tests, except obfuscated because you're not using the
 package name.  I don't know why you'd ever want to do that.

Right, exactly.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: New juju-mongodb package

2013-11-28 Thread Gustavo Niemeyer

Thanks for pushing this, James.

It would be good to have the mongo binary available and working as
well, also under that juju-specific namespace. This is the console
client, and will be useful to connect to the local juju database when
debugging issues.

On Thu, Nov 28, 2013 at 11:46 AM, James Page james.p...@canonical.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Hi Folks

 I've started working on the new, stripped down, juju specific MongoDB
 package that we have been discussing over the last few weeks.

 I'm proposing a package structure like this:

   ./usr/lib/juju/bin/mongos
   ./usr/lib/juju/bin/mongod

 No users will be created; its just the binaries; upstart and general
 system configuration such as creating users will be the responsibility
 of juju.

 The mongod and mongos binaries will be provided in a juju namespaced
 location to avoid conflicting with the standard mongodb package; v8
 will be linked statically using the embedded copy of v8 in the mongodb
 source code - this avoids exposing v8 generally in main and allows the
 security team to manage mongodb/v8 in the context of its use with
 juju, rather than in more broad general use.

 The plan is that we will apply for a minor release exception for this
 package, and that if need be we can update to a new major release (2.6
 for example) at some point in the future without impacting the rest of
 the distro by bumping the standard mongodb package.

 The total compressed package size is about 7MB - expanding to about
 23MB on disk.

 I still need todo some work on getting the embedded v8 copy to build
 for armhf (MongoDB upstream strip this out) - arm64 has been discussed
 but that's going to need some work upstream to enable v8 for this
 architecture.

 Other bugs pertinent MongoDB/juju usage would include:

   https://bugs.launchpad.net/juju-core/+bug/1208430

 I'm pretty sure that running mongodb not as root will be part of the
 security team signoff on the MIR review.

 Cheers

 James

 - --
 James Page
 Technical Lead
 Ubuntu Server Team
 james.p...@canonical.com
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.15 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBCAAGBQJSl0lPAAoJEL/srsug59jDwrkQAKQuDJsm5I8YPeIwzTv+4/Gd
 mcIQHcPer3EHaMB9/4zjWg8EoNuU0dj52PCjEEbI6zBZek1VqqxYLbsLxdzoo6x1
 SAVulHOCG/oVcjL/XVQ7EYTodZrXOEAYKlAOOxI2pj9ea6XoQVI3i4SsZdMCNUyr
 +CBUKrxH8YPcM3mXIyJBw0qbiHFLJQsywC1gZnsLTomox2Ob+eIk+n/CH2d0tMv3
 DyD5c5GAypnHzmrsiteJuPu01OqMaYsiltRWaFFEzuV7C8eIVW4uRFowC+xX8a0i
 UZ6FTUri4l9F4OahoJdVyjTofgQuis6pa/uKQWp7AUA+40JB/uMKApNV1xJ1772c
 8YXVYNVYEc9r+x/LzuqFyHt1BEdqYgDt4ZG7mYR0AwW4p4eK7wAFRElH8JPVdEQ9
 9iDFV/FFFCcWdVKRKSUuDCdv3bCNnEOLZU2qD0db9IPUNfGGeedkrSJAXGxBiCkg
 9OBxp4xylrbU4gw9tJDmicctIXG6N+n/XMlsDj5FkWqmGAq3HqdXw6VCPJ48X7+m
 ZHnXoQniR0eoh021xxFmb+f4eTG6U9YY8oyefMERLliVD/a26qQ6VQDa0M1mJig1
 OQzbJoaXbuJulew7B+sFI0ltMtx6CVhtyXobKATEKMrzs5GcqT15B7+K5W0Uisca
 Wp2ySVEZGOc3Sv3fobBC
 =XBGC
 -END PGP SIGNATURE-

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Deleting code from goyaml

2013-11-14 Thread Gustavo Niemeyer

davecheney wallyworld_: i fixed the bug, tests all pass
davecheney by deleting code
davecheney i'm not sure how gustavo will like that :)
wallyworld_ davecheney: ah, ok. good luck :-)

For the record, please don't delete apparently unused logic from the
*c.go files in goyaml, unless you went deep into the subject and
justified accordingly in the proposal.

There is certainly a non-trivial number of uncovered paths, because
these files were ported from the C libyaml. For that reason, goyaml
will definitely have uncovered paths, not only because we may be
lacking paths, but also because we may be lacking the feature itself
at the moment (for example, multi-document parsing). We should evolve
towards having more tests and more of these features covered, instead
of nuking the logic without proper analysis that it was unnecessary in
C also.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Deleting code from goyaml

2013-11-14 Thread Gustavo Niemeyer

I don't think the facts I brought up were clear, independently from
what the MP does (For the record ...).

On Thu, Nov 14, 2013 at 9:48 AM, Ian Booth ian.bo...@canonical.com wrote:
 There was no deleted code the the mp that I saw:

 https://code.launchpad.net/~dave-cheney/goyaml/goyaml/+merge/195162

 Dave may have been referring on irc to an earlier iteration of his work.
 His approach was also discussed at the Juju team meeting, and unless I
 mis-remember, there was broad approval of the approach taken.

 On 14/11/13 21:33, Gustavo Niemeyer wrote:
 davecheney wallyworld_: i fixed the bug, tests all pass
 davecheney by deleting code
 davecheney i'm not sure how gustavo will like that :)
 wallyworld_ davecheney: ah, ok. good luck :-)

 For the record, please don't delete apparently unused logic from the
 *c.go files in goyaml, unless you went deep into the subject and
 justified accordingly in the proposal.

 There is certainly a non-trivial number of uncovered paths, because
 these files were ported from the C libyaml. For that reason, goyaml
 will definitely have uncovered paths, not only because we may be
 lacking paths, but also because we may be lacking the feature itself
 at the moment (for example, multi-document parsing). We should evolve
 towards having more tests and more of these features covered, instead
 of nuking the logic without proper analysis that it was unnecessary in
 C also.


 gustavo @ http://niemeyer.net


 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer

On Fri, Nov 8, 2013 at 8:31 AM, John Arbash Meinel
j...@arbash-meinel.com wrote:
 I would probably avoid putting such an emphasis on any machine can be
 a manager machine. But that is my personal opinion. (If you want HA
 you probably want it on dedicated nodes.)

Resource waste holds juju back for the small users. Being able to
share a state server with other resources does sound attractive from
that perspective. It may be the difference between running 3 machines
or 6.

 I would probably also remove the machine if the only thing on it was
 the management. Certainly that is how people want us to do juju
 remove-unit.

If there are other units in the same machine, we should definitely not
remove the machine on remove-unit. The principle sounds the same with
state servers.

 The main problem with this is that it feels slightly too easy to add
 just 1 machine and then not actually have HA (mongo stops allowing
 writes if you have a 2-node cluster and lose one, right?)

+1


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer

These are *very* good points, Mark. Taking them to heart will
definitely lead into a good direction for the overall feature
development.

It sounds like we should avoid using a management command for
anything in juju, though. Most things in juju are about management one
way or the other, so juju management becomes very unclear and hard
to search for.

Instead, the command might be named after what we've been calling them:

juju add-state-server -n 2

For implementation convenience sake, it would be okay to only ever
accept -n 2 when this is first released. I can also imagine the
behavior of this command resembling add-unit in a few aspects, since a
state server is in fact code that just needs a home to run in. This
may yield other common options across them, such as machine selection.


On Fri, Nov 8, 2013 at 6:47 AM, Mark Canonical Ramm-Christensen
mark.ramm-christen...@canonical.com wrote:
 I have a few high level thoughts on all of this, but the key thing I want to
 say is that we need to get a meeting setup next week for the solution to get
 hammered out.

 First, conceptually, I don't believe the user model needs to match the
 implementation model.  That way lies madness -- users care about the things
 they care about and should not have to understand how the system works to
 get something basic done. See:
 http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
 reasons why I call this madness.

 For that reason I think the path of adding a --jobs flag to add-machine is
 not a move forward.  It is exposing implementation detail to users and
 forcing them into a more complex conceptual model.

 Second, we don't have to boil the ocean all at once. An ensure-ha command
 that sets up additional server nodes is better than what we have now --
 nothing.  Nate is right, the box need not be black, we could have an juju
 ha-status command that just shows the state of HA.   This is fundamentally
 different than changing the behavior and meaning of add-machines to know
 about juju jobs and agents and forcing folks to think about that.

 Third, we I think it is possible to chart a course from ensure-ha as a
 shortcut (implemented first) to the type of syntax and feature set that
 Kapil is talking about.  And let's not kid ourselves, there are a bunch of
 new features in that proposal:

  * Namespaces for services
  * support for subordinates to state services
  * logging changes
  * lifecycle events on juju jobs
  * special casing the removal of services that would kill the environment
  * special casing the stats to know about HA and warn for even state server
 nodes

 I think we will be adding a new concept and some new syntax when we add HA
 to juju -- so the idea is just to make it easier for users to understand,
 and to allow a path forward to something like what Kapil suggests in the
 future.   And I'm pretty solidly convinced that there is an incremental path
 forward.

 Fourth, the spelling ensure-ha is probably not a very good idea, the
 cracks in that system (like taking a -n flag, and dealing with failed
 machines) are already apparent.

 I think something like Nick's proposal for add-manager would be better.
 Though I don't think that's quite right either.

 So, I propose we add one new idea for users -- a state-server.

 then you'd have:

 juju management --info
 juju management --add
 juju management --add --to 3
 juju management --remove-from

 I know this is not following the add-machine format, but I think it would be
 better to migrate that to something more like this:

 juju machine --add

 --Mark Ramm





 On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.com
 wrote:

 On 6 November 2013 20:07, Kapil Thangavelu
 kapil.thangav...@canonical.com wrote:
  instead of adding more complexity and concepts, it would be ideal if we
  could reuse the primitives we already have. ie juju environments have
  three
  user exposed services, that users can add-unit / remove-unit etc.  they
  have
  a juju prefix and therefore are omitted by default from status listing.
  That's a much simpler story to document. how do i scale my state
  server..
  juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.

 I have a lot of sympathy with this point of view. I've thought about
 it quite a bit.

 I see two possibilities for implementing it:

 1) Keep something like the existing architecture, where machine agents can
 take on managerial roles, but provide a veneer over the top which
 specially interprets service operations on the juju built-in services
 and translates them into operations on machine jobs.

 2) Actually implement the various juju services as proper services.

 The difficulty I have with 1) is that there's a significant mismatch
 between
 the user's view of things and what's going on underneath.
 For instance, with a built-in service, can I:

 - add a subordinate service to it?
 - see the relevant log file in the usual place for a unit?
 - see its

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer

On Fri, Nov 8, 2013 at 9:39 AM, Nate Finch nate.fi...@canonical.com wrote:
 If you only have 3 machines, do you really need HA from juju? You don't have
 HA from your machines that are actually running your service.

Why not? I have three machines..

 Yeah, same here. I still think we need a turn on HA mode command that'll
 bring you to 3 servers.  It doesn't have to be the swiss army knife that we
 said before... just something to go from non-HA to valid HA environment.

This looks fine:

juju add-state-server -n 2

It's easy to error if current + n is not a good number.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer

We'll end up with a command that adds a state server, with a replica
of the database and an API server. That's the notion of state server
we've been using all along, and sounds quite reasonable, easy to
explain and understand.

On Fri, Nov 8, 2013 at 10:15 AM, roger peppe roger.pe...@canonical.com wrote:
 On 8 November 2013 12:03, Gustavo Niemeyer gust...@niemeyer.net wrote:
 Splitting API and db at some point sounds sensible, but it may be easy and
 convenient to think about a state server as API+db for the time being.

 I'd prefer to start with a command name that implies that possibility;
 otherwise we'll end up either with a command that doesn't
 describe what it actually does, or more very similar commands
 where one could be sufficient.

 Hence my discomfort with add-state-server as a command name.



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer

juju add-state-server --api-only-please-thanks




On Fri, Nov 8, 2013 at 11:43 AM, roger peppe roger.pe...@canonical.com wrote:
 On 8 November 2013 13:33, Gustavo Niemeyer gust...@niemeyer.net wrote:
 We'll end up with a command that adds a state server, with a replica
 of the database and an API server. That's the notion of state server
 we've been using all along, and sounds quite reasonable, easy to
 explain and understand.

 And when we want to split API and db, as you thought perhaps
 might be sensible at some point, what then?



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer

On Fri, Nov 8, 2013 at 12:04 PM, roger peppe roger.pe...@canonical.com wrote:
 On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote:
 juju add-state-server --api-only-please-thanks

 And if we want to allow a machine that runs the environment-manager
 workers but not the api server or mongo server (not actually an unlikely thing
 given certain future possibilities) then add-state-server is a command that
 doesn't necessarily add a state server at all... That thought
 was the source of my doubt.

The fact you can organize things a thousand ways doesn't mean we
should offer a thousand knobs. A state server is a good abstraction
for there are management routines running there. You can define what
that means, as long as you don't let things fall down when N/2-1
machines fall down.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer

It doesn't feel like the difference between

juju ensure-ha --prefer-machines 11,37

and

juju add-state-server --to 11,37

is worth the amount of reasoning there.  I'm clearly in favor of the
latter, but I wouldn't argue so much for it.


On Fri, Nov 8, 2013 at 2:00 PM, William Reade
william.re...@canonical.com wrote:
 I'm concerned that we're (1) rehashing decisions made during the sprint and
 (2) deviating from requirements in doing so.

 In particular, abstracting HA away into management manipulations -- as
 roger notes, pretty much isomorphic to the jobs proposal -- doesn't give
 users HA so much as it gives them a limited toolkit with which they can
 more-or-less construct their own HA; in particular, allowing people to use
 an even number of state servers is strictly a bad thing [0], and I'm
 extremely suspicious of any proposal that opens that door.

 Of course, some will argue that mongo should be able to scale separately
 from the api servers and other management tasks, and this is a worthy goal;
 but in this context it sucks us down into the morass of exposing different
 types of management on different machines, and ends up approaching the jobs
 proposal still closer, in that it requires users to assimilate a whole load
 of extra terminology in order to perform a conceptually simple function.

 Conversely, ensure-ha (with possible optional --redundancy=N flag,
 defaulting to 1) is a simple model that can be simply explained: the
 command's sole purpose is to ensure that juju management cannot fail as a
 result to the simultaneous failure of =N machines. It's a *user-level*
 construct that will always be applicable even in the context of a more
 sophisticated future language (no matter what's going on with this
 complicated management/jobs business, you can run that and be assured you'll
 end up with at least enough manager machines to fulfil the requirement you
 clearly stated in the command line).

 I haven't seen anything that makes me think that redesigning from scratch is
 in any way superior to refining what we already agreed upon; and it's
 distracting us from the questions of reporting and correcting manager
 failure when it occurs. I assert the following series of arguments:

 * users may discover at any time that they need to make an existing
 environment HA, so ensure-ha is *always* a reasonable user action
 * users who *don't* need an HA environment can, by definition, afford to
 take the environment down and reconstruct it without HA if it becomes
 unimportant
 * therefore, scaling management *down* is not the highest priority for us
 (but is nonetheless easily amenable to future control via the ensure-ha
 command -- just explicitly set a lower redundancy number)
 * similarly, allowing users to *directly* destroy management machines
 enables exciting new failure modes that don't really need to exist

 * the notion of HA is somewhat limited in worth when there's no way to make
 a vulnerable environment robust again
 * the more complexity we shovel onto the user's plate, the less likely she
 is to resolve the situation correctly under stress
 * the most obvious, and foolproof, command for repairing HA would be
 ensure-ha itself, which could very reasonably take it upon itself to
 replace manager nodes detected as down -- assuming a robust presence
 implementation, which we need anyway, this (1) works trivially for machines
 that die unexpectedly and (2) allows a backdoor for resolution of weird
 situations: the user can manually shutdown a misbehaving manager
 out-of-band, and run ensure-ha to cause a new one to be spun up in its
 place; once HA is restored, the old machine will no longer be a manager, no
 longer be indestructible, and can be cleaned up at leisure

 * the notion is even more limited when you can't even tell when something
 goes wrong
 * therefore, HA state should *at least* be clearly and loudly communicated
 in status
 * but that's not very proactive, and I'd like to see a plan for how we're
 going to respond to these situations when we detect them

 * the data accessible to a manager node is sensitive, and we shouldn't
 generally be putting manager nodes on dirty machines; but density is an
 important consideration, and I don't think it's confusing to allow
 preferred machines to be specified in ensure-ha, such that *if*
 management capacity needs to be added it will be put onto those machines
 before finding clean ones or provisioning new ones
 * strawman syntax: juju ensure-ha --prefer-machines 11,37 to place any
 additional manager tasks that may be required on the supplied machines in
 order of preference -- but even this falls far behind the essential goal,
 which is make HA *easy* for our users.
 * (ofc, we should continue not to put units onto manager machines by
 default, but allow them when forced with --to as before)

 I don't believe that any of this precludes more sophisticated management of
 juju's internal functions *when* the need becomes

Re: Scale Testing: Now with profiling!

2013-11-04 Thread Gustavo Niemeyer

On Mon, Nov 4, 2013 at 12:04 PM, John Arbash Meinel
j...@arbash-meinel.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 2013-11-04 17:52, roger peppe wrote:
 There's no point in salting the agent passwords, and we can't
 easily change things to salt the user passwords until none of the
 command line tools talk directly to mongo, so I'm +1 on john's
 patch for now.

 We can absolutely salt both. *Salt* is all about reading the salt from
 what you've stored in the DB and using that to compute the hash. It is
 simply to prevent rainbow attacks (precompute the hash of 1M common
 user passwords and compare it to the content in the DB.)

Roger was talking about the agent passwords, which you described as
having passwords that are nice long random
strings. There's no common user password in that case.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Notes from Scale testing

2013-10-30 Thread Gustavo Niemeyer

On Wed, Oct 30, 2013 at 6:23 AM, John Arbash Meinel
j...@arbash-meinel.com wrote:
 I'm trying to put together a quick summary of what I've found out so
 far with testing juju in an environment with thousands (5000+) agents.

Great testing, John.

 2) Agents seem to consume about 17MB resident according to 'top'. That
 should mean we can run ~450 agents on an m1.large. Though in my
 testing I was running ~450 and still had free memory, so I'm guessing
 there might be some copy-on-write pages (17MB is very close to the
 size of the jujud binary).

Yeah, RSS is not straightforward to measure correctly. The crude
readings are pretty much always overestimated.

 4) If I bring up the units one by one (for i in `seq 500`; do for j in
 `seq 10` do juju add-unit --to $j ; time wait; done), it ends up
 triggering O(N^2) behavior in the system. Each unit agent seems to
 have a watcher for other units of the same service. So when you add 1
 unit, it wakes up all existing units to let them know about it. In
 theory this is on a 5s rate limit (only 1 wakeup per 5 seconds). In
 practice it was taking 3s per add unit call [even when requesting
 them in parallel]. I think this was because of the load on the API
 server of all the other units waking up and asking for details at the
 same time.

In theory answering all watching questions should eventually be very
cheap, if it isn't right now. The yes/no data for several thousand
units easily fits all in memory, and the API servers also learn about
changes as they go through to the database, so there's no big reason
to touch the database for these operations.

Caching will also play a larger role inside the API servers as juju
moves towards scalability.

 - From what I can tell, all units take out a watch on their service so
 that they can monitor its Life and CharmURL. However, adding a unit to
 a service triggers a change on that service, even though Life and
 CharmURL haven't changed. If we split out Watching the
 units-on-a-service from the lifetime and URL of a service, we could
 avoid the thundering N^2 herd problem while starting up a bunch of
 units. Though UpgradeCharm is still going to thundering herd.

Where is N^2 coming from?

 It then seems to restart the Unit agent, which goes through the steps
 of making all the same requests again. (Get the Life of my Unit, get
 the Life of my service, get the UUID of this environment, etc., there
 are 41 requests before it gets to APIAddress)

Ugh!

 I would be fine doing max(1, NumCPUs()-1) or something similar. I'd
 rather do it inside jujud rather than in the cloud-init script,
 because computing NumCPUs is easier there. But we should have *a* way
 to scale up the central node that isn't just scaling out to more API
 servers.

NumCPUs sounds like a fine initial setting.

 I certainly think we need a way to scale Mongo as well. If it is just
 1 CPU per connection then scaling horizontally with API servers should
 get us around that limit.

It's one thread per connection.

 10) Allowing juju add-unit -n 100 --to X did make things a lot
 easier to bring up. Though it still takes a while for the request to
 finish. It felt like the api call triggered work to start happening in
 the background which made the current api call take longer to finally
 complete. (as in, minutes once we had 1000 units).

It doesn't sound like optimizing for a huge volume of immediate units
is an important goal, other than for streamlining that kind of
testing. I have very rarely observed use cases that do that, or that
have resources available at all for doing that. Even large providers
will generally deny requests for large deltas at once, due to their
controlled growth schedules.

It sounds more relevant to have juju able to cope with these loads
effectively and timely once they do reach such scales.

 Not everything in there is worth landing in trunk (rudimentary API
 caching, etc).

 That's all I can think of for now, though I think there is more to be
 explored.

Again, well done.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Control different relation sequence

2013-09-04 Thread Gustavo Niemeyer

Exactly, that's what I would probably do as well. Once you are within
a relation you want to wait for further actions, dump the
$JUJU_RELATION_ID into a file and wait until you want to wake it up
again.  Hooks are guaranteed to be run in series, so you don't have to
worry about concurrency issues around the file.

On Wed, Sep 4, 2013 at 12:01 AM, Mike Sam mikesam...@gmail.com wrote:
 Thanks. Does this mean that the charm should cache the relation id's in a
 text file or something?


 On Tue, Sep 3, 2013 at 7:33 PM, Gustavo Niemeyer
 gustavo.nieme...@canonical.com wrote:

 The relation-set command accepts a -r parameter which takes the relation
 id
 to act upon. You can pick the relation id of an executing hook from
 the JUJU_RELATION_ID environment variable. This way you can act across
 relations.

 Hopefully this will be better documented at some point.

 On Tue, Sep 3, 2013 at 11:23 PM, Mike Sam mikesam...@gmail.com wrote:
  Thanks Gustavo but I did not quite get your point. The problem is that
  for
  the new unit for service A, the dependent hooks are on two different
  independent relationships. I mean I can control when the new unit of
  Service
  A has properly established a relation with all the units of service B on
  say
  relation x_relation_changed, but how do I make all the units of service
  C to
  now trigger the y_relation_changed hook of the Service A unit because
  the
  unit is ready to process them? How do I make y_relation_changed hook to
  get
  triggered AGAIN (in case it has already been triggered but ignored
  because
  relation with service B was not done setting up) when x_relation_changed
  see
  fit? Would you please explain your point is the Service A, B, C context
  of
  my example?
 
 
 
 
  On Tue, Sep 3, 2013 at 6:38 PM, Gustavo Niemeyer
  gustavo.nieme...@canonical.com wrote:
 
  Hi Mike,
 
  You cannot control the sequence in which the hooks are executed, but
  you have full control of what you do when the hooks do execute. You
  can choose to send nothing to the other side of the relation until its
  time to report that a connection may now be established, and when you
  do change the relation, the remote hook will run again to report the
  change.
 
  On Tue, Sep 3, 2013 at 10:17 PM, Mike Sam mikesam...@gmail.com wrote:
   Imagine a unit needs to be added to an existing service like service
   A.
   Service A is already in relations with other services like Service B
   and
   Service C on different requires.
  
   For the new unit on Service A to work, it needs to first process the
   relation_joined and relation_changed with the units of service B
   before
   it
   could process  relation_joined and relation_changed with the units of
   service C.
  
   Is there a way to enforce such desired sequence relationship
   establishment
   at the charm level? In other words, I do not think we can control the
   hook
   execution sequence of different relationships officially but then I
   am
   wondering how can we do a situation like above nicely?
  
   Thanks,
   Mike
  
  
  
  
  
   --
   Juju-dev mailing list
   Juju-dev@lists.ubuntu.com
   Modify settings or unsubscribe at:
   https://lists.ubuntu.com/mailman/listinfo/juju-dev
  
 
  --
  gustavo @ http://niemeyer.net
 
 



 --
 gustavo @ http://niemeyer.net





-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Control different relation sequence

2013-09-03 Thread Gustavo Niemeyer

Hi Mike,

You cannot control the sequence in which the hooks are executed, but
you have full control of what you do when the hooks do execute. You
can choose to send nothing to the other side of the relation until its
time to report that a connection may now be established, and when you
do change the relation, the remote hook will run again to report the
change.

On Tue, Sep 3, 2013 at 10:17 PM, Mike Sam mikesam...@gmail.com wrote:
 Imagine a unit needs to be added to an existing service like service A.
 Service A is already in relations with other services like Service B and
 Service C on different requires.

 For the new unit on Service A to work, it needs to first process the
 relation_joined and relation_changed with the units of service B before it
 could process  relation_joined and relation_changed with the units of
 service C.

 Is there a way to enforce such desired sequence relationship establishment
 at the charm level? In other words, I do not think we can control the hook
 execution sequence of different relationships officially but then I am
 wondering how can we do a situation like above nicely?

 Thanks,
 Mike





 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

89 matches

Mail list logo