Re: RFC: mongo "_id" fields in the multi-environment juju server world

William Reade Fri, 04 Jul 2014 03:25:27 -0700

My expectation is that:

1) We certainly need the environment UUID as a separate field for the shard
key.
2) We *also* need the environment UUID as an _id prefix to keep our
watchers sane.
2a) If we had separate collections per environment, we wouldn't; but AIUI,
scaling mongo by adding collections tends to end badly (I don't have direct
experience here myself; but it does indeed seem that we'd start consuming
namespaces at a pretty terrifying rate, and I'm inclined to trust those who
have done this and failed.)
2b) I'd ordinarily dislike the duplication across the _id and uuid fields,
but there's a clear reason for doing so here, so I'm not going to complain.
I *will* continue to complain about documents that duplicate info across
fields in order to save a few runtime microseconds here and there ;).


If someone with direct experience can chip in reassuringly I *might* be
prepared to back off on the N-collections-per-environment thing, but I'm
certainly not willing to take it so far as to separate the txn logs and
thus discard consistency across environments: I think there will certainly
be references between individual hosted environments and the initial
environment.

So, in short, I think Tim's (1) is the way to go. But *please* don't
duplicate data that doesn't have to be -- the UUID is fine, the name is
not. If we really end up spending a lot of time extracting names from _id
fields we can cache them in the state documents -- but we don't need
redundant copies in the DB, and we *really* don't need to make our lives
harder by giving our data unnecessary opportunities for inconsistency.

Cheers
William



On Fri, Jul 4, 2014 at 6:42 AM, John Meinel <j...@arbash-meinel.com> wrote:

> According to the mongo docs:
> http://docs.mongodb.org/manual/core/document/#record-documents
> The field name _id is reserved for use as a primary key; its value must
> be unique in the collection, is immutable, and may be of any type other
> than an array.
>
> That makes it sound like we *could* use an object for the _id field and do
> _id = {env_uuid:, name:}
>
> Though I thought the purpose of doing something like that is to allow
> efficient sharding in a multi-environment world.
>
> Looking here: http://docs.mongodb.org/manual/core/sharding-shard-key/
> The shard key must be indexed (which is just fine for us w/ the primary
> _id field or with any other field on the documents), and "The index on the
> shard key *cannot* be a *multikey index
> <http://docs.mongodb.org/manual/core/index-multikey/#index-type-multikey>".*
> I don't really know what that means in the case of wanting to shard based
> on an object instead of a simple string, but it does sound like it might be
> a problem.
> Anyway, for purposes of being *unique* we may need to put environ uuid in
> there, but for the purposes of sharding we could just put it on another
> field and index that field.
>
> John
> =:->
>
>
>
> On Fri, Jul 4, 2014 at 5:01 AM, Tim Penhey <tim.pen...@canonical.com>
> wrote:
>
>> Hi folks,
>>
>> Very shortly we are going to start on the work to be able to store
>> multiple environments within a single mongo database.
>>
>> Most of our current entities are stored in the database with their name
>> or id fields serialized to bson as the _id field.
>>
>> As far as I know (and I may be wrong), if you are adding a document to
>> the mongo collection, and you do not specify an _id field, mongo will
>> create a unique value for you.
>>
>> In our new world, things that used to be unique, like machines,
>> services, units etc, are now only unique when paired with the
>> environment id.
>>
>> It seems we have a number of options here.
>>
>> 1. change the _id field to be a "composed" field where it is the
>> concatenation of the environment id and the existing id or name field.
>> If we do take this approach, I strongly recommend having the fields that
>> make up the key be available by themselves elsewhere in the document
>> structure.
>>
>> 2. let mongo create the _id field, and we ensure uniqueness over the
>> pair of values with a unique index. One think I am unsure about with
>> this approach is how we currently do our insertion checks, where we do a
>> "document does not exist" check.  We wouldn't be able to do this as a
>> transaction assertion as it can only check for _id values.  How fast are
>> the indices updated?  Can having a unique index for a document work for
>> us?  I'm hoping it can if this is the way to go.
>>
>> 3. use a composite _id field such that the document may start like this:
>>   { _id: { env_uuid: "blah", name: "foo"}, ...
>> This gives the benefit of existence checks, and real names for the _id
>> parts.
>>
>> Thoughts? Opinions? Recommendations?
>>
>> BTW, I think that if we can make 3 work, then it is the best approach.
>>
>> Tim
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: RFC: mongo "_id" fields in the multi-environment juju server world

Reply via email to