I tend to agree. I'm not sure how many records you're expecting, but it sure seems like a lot of complicated overhead to avoid some storage cost. Obviously if you're expecting a large number of entities or you've got a lot of indexes and/or reference properties it might save some storage space, but you'll need several tens of millions of entities to save a few dollars. Have you calculated your expected savings from this?
Robert On Tue, Aug 9, 2011 at 08:32, tempy <fay...@gmail.com> wrote: > Murph I've been doing exactly what you're doing, creating GAE entities > with UUIDs corresponding to UUID keys from an external DB. I just > stick the UUID into the key's name property, and it works just fine. I > think you're overcomplicating things - GAE storage is dirt cheap, its > cpu time you have to worry about, so you should spend your > optimization cycles on getting request speeds down and doing as much > work as possible asynchronously, as opposed to esoteric hashing which > will in the end win you peanuts. > > On Aug 9, 12:31 pm, Murph <paul.j.mu...@googlemail.com> wrote: >> After a little testing, I discovered a flaw in my previously posted >> technique, namely that it was generating uint64 IDs, when the datastore will >> only accept non-zero uint63s. I'd also forgotten to include my UUIDProperty >> class. Here's the updated version. The question remains whether there's >> likely to be any horrible loss of efficiency or problems by assigning IDs >> which will be essentially randomly distributed through the possible number >> space (the external DB mostly uses UUIDv4, so the distribution should be >> quite random). As previously stated, auto-assigned IDs will not be used for >> these models, so collisions or disruption of auto-assignment are not a major >> concerns. The code probably still needs the odd bit of polishing here and >> there. >> >> MASK_64 = 2**64-1 >> MASK_63 = 2**63-1 >> >> class UUID(uuid.UUID): >> # Could use this, but Python doesn't guarantee future stability of >> # hash values >> # def get_id(self): >> # return abs(hash(self.int)) >> >> def get_id(self): >> """ Returns a positive, non-zero 63 bit integer from the >> UUID's int128 """ >> x = ((self.int & MASK_64) ^ (self.int >> 64)) & MASK_63 >> if (x == 0): >> x = 1 >> return x >> >> id = property(get_id) >> >> class UUIDProperty(Property): >> """A UUID property, stored as a 16 byte binary string.""" >> >> data_type = UUID >> >> def get_value_for_datastore(self, model_instance): >> uuid = super(UUIDProperty, >> self).get_value_for_datastore(model_instance) >> return ByteString(uuid.bytes) >> >> def make_value_from_datastore(self, value): >> if value is None: >> return None >> return UUID(bytes=value) >> >> def validate(self, value): >> if value is not None and not isinstance(value, self.data_type): >> try: >> value = self.data_type(value) >> except TypeError, err: >> raise BadValueError('Property %s must be convertible ' >> 'to a %s instance (%s)' % >> (self.name, self.data_type.__name__, >> err)) >> value = super(UUIDProperty, self).validate(value) >> if value is not None and not isinstance(value, self.data_type): >> raise BadValueError('Property %s must be a %s instance' % >> (self.name, self.data_type.__name__)) >> return value >> >> class UUIDModel(Model): >> @classmethod >> def get_by_uuid(cls, uuids, **kwds): >> uuids, multiple = datastore.NormalizeAndTypeCheck(uuids, (UUID, >> str)) >> def normalize(uuid): >> if isinstance(uuid, str): >> return UUID(uuid) >> else: >> return uuid >> uuids = [normalize(uuid) for uuid in uuids] >> ids = [uuid.id for uuid in uuids] >> entities = cls.get_by_id(ids, **kwds) >> for index, entity in enumerate(entities): >> if entity is not None and entity.uuid != uuids[index]: >> raise BadKeyError('UUID hash collision detected (class %s): >> ' >> '%s / %s' >> % (cls.kind(), entity.uuid, uuids[index])) >> if multiple: >> return entities >> else: >> return entities[0] >> >> @classmethod >> def get_or_insert_by_uuid(cls, uuid, **kwds): >> if isinstance(uuid, str): >> uuid = UUID(uuid) >> id = uuid.id >> def txn(): >> entity = cls.get_by_id(id, parent=kwds.get('parent')) >> if entity is None: >> entity = cls(key=Key.from_path(cls.kind(), id, >> parent=kwds.get('parent')), >> uuid=uuid, >> **kwds) >> entity.put() >> elif entity.uuid != uuid: >> raise BadKeyError('UUID hash collision detected (class %s): >> ' >> '%s / %s' >> % (cls.kind(), entity.uuid, uuid)) >> return entity >> return db.run_in_transaction(txn) >> >> uuid = UUIDProperty('UUID') > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appengine@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.