My expectation is that: 1) We certainly need the environment UUID as a separate field for the shard key. 2) We *also* need the environment UUID as an _id prefix to keep our watchers sane. 2a) If we had separate collections per environment, we wouldn't; but AIUI, scaling mongo by adding collections tends to end badly (I don't have direct experience here myself; but it does indeed seem that we'd start consuming namespaces at a pretty terrifying rate, and I'm inclined to trust those who have done this and failed.) 2b) I'd ordinarily dislike the duplication across the _id and uuid fields, but there's a clear reason for doing so here, so I'm not going to complain. I *will* continue to complain about documents that duplicate info across fields in order to save a few runtime microseconds here and there ;).
If someone with direct experience can chip in reassuringly I *might* be prepared to back off on the N-collections-per-environment thing, but I'm certainly not willing to take it so far as to separate the txn logs and thus discard consistency across environments: I think there will certainly be references between individual hosted environments and the initial environment. So, in short, I think Tim's (1) is the way to go. But *please* don't duplicate data that doesn't have to be -- the UUID is fine, the name is not. If we really end up spending a lot of time extracting names from _id fields we can cache them in the state documents -- but we don't need redundant copies in the DB, and we *really* don't need to make our lives harder by giving our data unnecessary opportunities for inconsistency. Cheers William On Fri, Jul 4, 2014 at 6:42 AM, John Meinel <j...@arbash-meinel.com> wrote: > According to the mongo docs: > http://docs.mongodb.org/manual/core/document/#record-documents > The field name _id is reserved for use as a primary key; its value must > be unique in the collection, is immutable, and may be of any type other > than an array. > > That makes it sound like we *could* use an object for the _id field and do > _id = {env_uuid:, name:} > > Though I thought the purpose of doing something like that is to allow > efficient sharding in a multi-environment world. > > Looking here: http://docs.mongodb.org/manual/core/sharding-shard-key/ > The shard key must be indexed (which is just fine for us w/ the primary > _id field or with any other field on the documents), and "The index on the > shard key *cannot* be a *multikey index > <http://docs.mongodb.org/manual/core/index-multikey/#index-type-multikey>".* > I don't really know what that means in the case of wanting to shard based > on an object instead of a simple string, but it does sound like it might be > a problem. > Anyway, for purposes of being *unique* we may need to put environ uuid in > there, but for the purposes of sharding we could just put it on another > field and index that field. > > John > =:-> > > > > On Fri, Jul 4, 2014 at 5:01 AM, Tim Penhey <tim.pen...@canonical.com> > wrote: > >> Hi folks, >> >> Very shortly we are going to start on the work to be able to store >> multiple environments within a single mongo database. >> >> Most of our current entities are stored in the database with their name >> or id fields serialized to bson as the _id field. >> >> As far as I know (and I may be wrong), if you are adding a document to >> the mongo collection, and you do not specify an _id field, mongo will >> create a unique value for you. >> >> In our new world, things that used to be unique, like machines, >> services, units etc, are now only unique when paired with the >> environment id. >> >> It seems we have a number of options here. >> >> 1. change the _id field to be a "composed" field where it is the >> concatenation of the environment id and the existing id or name field. >> If we do take this approach, I strongly recommend having the fields that >> make up the key be available by themselves elsewhere in the document >> structure. >> >> 2. let mongo create the _id field, and we ensure uniqueness over the >> pair of values with a unique index. One think I am unsure about with >> this approach is how we currently do our insertion checks, where we do a >> "document does not exist" check. We wouldn't be able to do this as a >> transaction assertion as it can only check for _id values. How fast are >> the indices updated? Can having a unique index for a document work for >> us? I'm hoping it can if this is the way to go. >> >> 3. use a composite _id field such that the document may start like this: >> { _id: { env_uuid: "blah", name: "foo"}, ... >> This gives the benefit of existence checks, and real names for the _id >> parts. >> >> Thoughts? Opinions? Recommendations? >> >> BTW, I think that if we can make 3 work, then it is the best approach. >> >> Tim >> >> -- >> Juju-dev mailing list >> Juju-dev@lists.ubuntu.com >> Modify settings or unsubscribe at: >> https://lists.ubuntu.com/mailman/listinfo/juju-dev >> > > > -- > Juju-dev mailing list > Juju-dev@lists.ubuntu.com > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > >
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev