[google-appengine] Re: Tuples, Exploding Index, IO talk

Nick Johnson (Google) Mon, 22 Jun 2009 05:55:10 -0700

Hi hawkett,

I presume by "key hash", you mean the string obtained by stringifying a Key
object (eg, str(key)). This is not a hash, but rather a base64 encoding of
the Key Protocol Buffer. We don't necessarily guarantee this encoding scheme
will not change (though it's rather unlikely), but we do guarantee that
db.Key(str(key)) == key - that is, that you'll always be able to reconstruct
a key from its string-form.


-Nick Johnson

On Mon, Jun 22, 2009 at 1:48 PM, hawkett <hawk...@gmail.com> wrote:

>
> This actually leads to another question that has been on my mind - how
> reliable is a key's string representation?  If one part of my tuple is
> a key, can I store it's hash and reliably turn that into the actual
> key for the object at a (much) later date, or do I need to store the
> key_name and path?  Put another way, does Google guarantee that the
> Key hashing algorithm will never change?  Is the hash produced for a
> given key_name and path guaranteed to be the same on both the SDK and
> the live environment?
>
> Just wanting to get a clear idea of the best approach to serialising
> keys.  Thanks,
>
> Colin
>
> On Jun 22, 11:10 am, "Nick Johnson (Google)" <nick.john...@google.com>
> wrote:
> > Hi hawkett,
> >
> >
> >
> > On Sat, Jun 20, 2009 at 3:05 PM, hawkett <hawk...@gmail.com> wrote:
> >
> > > Hi,
> >
> > >   I was watching Brett's IO talk re. using 'Relational Index Tables',
> > > and there were a few hints of things in there, and I just wanted to
> > > check I got it all correctly -
> >
> > > 1.  Lists are good for tuples - a use case I see is an entity being
> > > tagged, and having a state within that tag - so the tuples might be
> > > ('tagA', 'PENDING') , ('tagB', 'ACCEPTED'), ('tagC', 'DENIED') etc. -
> > > so the list structures would be
> >
> > > class Thing(db.Model):
> > >  name = db.StringProperty()
> > >  tags = db.ListProperty(str, default=[])
> > >  states = db.ListProperty(str, default=[])
> >
> > > with their contents tags = ['tagA', 'tagB', 'tagC'], states =
> > > ['PENDING', 'ACCEPTED', 'DENIED']
> >
> > > and as data comes and goes you maintain both lists to ensure you
> > > record the correct state for the correct tag by matching their list
> > > position.
> >
> > A much better approach is to use a single ListProperty, and serialize
> your
> > tuples to it - using Pickle, JSON, CSV, etc - whatever suits. If you
> want,
> > you can easily write a custom Datastore property class to make this
> easier.
> > This allows you to do everything you outlined below without extra effort.
> >
> > -Nick Johnson
> >
> >
> >
> >
> >
> > > 2.  Relational Index Tables are good for exploding index problems - so
> > > the query here might be -
> > > "get me all the 'Things' which have 'tagA' and which are 'PENDING' in
> > > that tag" - i.e. all records with the tuple ('tagA, 'PENDING'), which
> > > would be a composite index over two list properties - an exploding
> > > index.
> >
> > > So assuming I've got the above right, I'm trying to work out a few
> > > things
> >
> > > a.  Without relational index tables, what is the best way to construct
> > > the query - e.g.
> >
> > > things = db.GqlQuery(
> > >  "SELECT * FROM Thing "
> > >  "WHERE tags = :1 AND states = :2", 'tagA', 'PENDING')
> >
> > > which would get me anything that had 'tagA' at any point in the tags
> > > list, and anything that had a 'PENDING' at any point in the states
> > > list. This is potentially many more records than those that match the
> > > tuple.  So then I have to do an in-memory cull of those records
> > > returned and work out which ones actually conform to the tuple?  Just
> > > wondering if I am missing something here, because it seems like a
> > > great method for storing a tuple, but complex to query for that same
> > > tuple?
> >
> > > b.  If I am going to use relational index tables, to avoid the
> > > exploding index that the above query could generate -
> >
> > > class Thing(db.model):
> > >  name = db.StringProperty()
> >
> > > class ThingTagIndex(db.Model)
> > >  tags = db.ListProperty(str, default=[])
> >
> > > class ThingStateIndex(db.Model)
> > >  states = db.ListProperty(str, default=[])
> >
> > > then am I right in thinking that my query would be performed as
> >
> > > tagIndexKeys = db.GqlQuery(
> > >  "SELECT __key__ FROM ThingTagIndex "
> > >  "WHERE tags = :1", 'tagA')
> >
> > > # All the things that have 'tagA' in their tags list
> > > thingTagKeys = [k.parent() for k in tagIndexKeys]
> >
> > > stateIndexKeys = db.GqlQuery(
> > >  "SELECT __key__ FROM ThingStateIndex "
> > >  "WHERE states = :1 AND ANCESTOR IN :2", 'PENDING', thingTagKeys)
> >
> > > # All the things that have both 'tagA' and 'PENDING' (but not
> > > necessarily as a tuple)
> > > thingKeys = [k.parent() for k in stateIndexKeys]
> >
> > > things = db.get(thingKeys)
> >
> > > # Oops - I need the lists to do the culling part of my tuple query
> > > from (a)
> >
> > > So I have avoided the exploding index by performing two separate
> > > queries, but I could have achieved much the same result without the
> > > index tables - i.e. by performing separate queries and avoiding the
> > > composite index.  Just wondering if I am seeing the tuple situation
> > > correctly - i.e. there is no way to query them that doesn't require
> > > some in-memory culling?  Thanks,
> >
> > > Colin
> >
> > --
> > Nick Johnson, App Engine Developer Programs Engineer
> > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> Number:
> > 368047
> >
>


-- 
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Tuples, Exploding Index, IO talk

Reply via email to