Re: RFC: mongo "_id" fields in the multi-environment juju server world

roger peppe Fri, 04 Jul 2014 02:02:24 -0700

On 4 July 2014 02:01, Tim Penhey <tim.pen...@canonical.com> wrote:
> Hi folks,
>
> Very shortly we are going to start on the work to be able to store
> multiple environments within a single mongo database.
>
> Most of our current entities are stored in the database with their name
> or id fields serialized to bson as the _id field.
>
> As far as I know (and I may be wrong), if you are adding a document to
> the mongo collection, and you do not specify an _id field, mongo will
> create a unique value for you.
>
> In our new world, things that used to be unique, like machines,
> services, units etc, are now only unique when paired with the
> environment id.
>
> It seems we have a number of options here.
>
> 1. change the _id field to be a "composed" field where it is the
> concatenation of the environment id and the existing id or name field.
> If we do take this approach, I strongly recommend having the fields that
> make up the key be available by themselves elsewhere in the document
> structure.
>
> 2. let mongo create the _id field, and we ensure uniqueness over the
> pair of values with a unique index. One think I am unsure about with
> this approach is how we currently do our insertion checks, where we do a
> "document does not exist" check.  We wouldn't be able to do this as a
> transaction assertion as it can only check for _id values.  How fast are
> the indices updated?  Can having a unique index for a document work for
> us?  I'm hoping it can if this is the way to go.
>
> 3. use a composite _id field such that the document may start like this:
>   { _id: { env_uuid: "blah", name: "foo"}, ...
> This gives the benefit of existence checks, and real names for the _id
> parts.
>
> Thoughts? Opinions? Recommendations?


There is another possiblity: we could just use a different collection
name prefix
for each environment. There is no hard limit on the number of collections
in mongo (see http://docs.mongodb.org/manual/reference/limits/).

That is, instead of using the current hard-coded collection names
("machines", "relations", etc) we'd prefix them with the environment id;
either the UUID or an id stored elsewhere.

This would entail very few changes to the existing code.

If we think that most operations on an environment will continue to
be specific to that environment, I think this has a few advantages.
Specifically, it minimises cross-talk between environments - one
large environment with heavy traffic will not unduly influence the others.

- for a small environment, table indexes remain small and lookups fast
even though the total number of entries might be huge.

- each environment could have a separate mongo txn log, so one busy
environment that's constantly adding transactions will not necessarily
slow down all the others. There is, in general, no need for sequential
consistency between
environments.

- database isolation between environments is an advantage when things
go wrong - it's easier to fix or delete individual environments if their
tables are isolated from one another.

The disadvantage is that you can't perform transactions that span multiple
environments. I think that's something we probably would not want to
do much anyway, but YMMV.

I suggest that, at the least, taking this approach would be a quick
road to making the state work with multiple environments. It
would not preclude a move to changing to use composite keys
in the future.

  cheers,
    rog.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: RFC: mongo "_id" fields in the multi-environment juju server world

Reply via email to