First, you should think about what the most common operation(s) will
be in the application.  Design so that the majority of the time, the
'other' operations perform well, but don't optimize for the 0.05%
case; in other words, optimize for 99% of the cases.

  The list-property limit of 5,000 applies to indexed lists, and is
referring to the number of index rows for an entity. With the number
of items you're expecting to store in the list, you should not use a
list property; it would be quite inefficient.
  
http://code.google.com/appengine/docs/python/datastore/overview.html#Quotas_and_Limits

  Tim's suggestion to serialize the data is great when you don't need
to search the data.  However, you do need to search the data so you'll
need a solution for that.  You could probably maintain some type of
'index' so that you know which 'list entity' a particular key would
belong in; that way you easily know which shard of the list you should
fetch and search.  Should be very easy if you keep the list sorted.
If you use a serialized list, you could compress it to squeeze more
values / entity, but I think you'll still wind up sharding the list
across entities.

  Another option you could consider may be a per-user bloom filter.
So for lookups you could store a serialized bloom-filter index that
could be quickly loaded and searched.  Then you'll be able to decided
for which ids you need to do the more expensive 'deep-search'
operation.  Store the actual list of ids in some way that is convient
for the most common operation(s).
  http://en.wikipedia.org/wiki/Bloom_filter

  Also, having a lot of entities won't be an issue if you've designed
your app to work well with your common operations.  I've dealt with
applications that have billions (yes, with a *B*) of entities and they
perform very well.  Particularly if the entities are small, you can
fetch 100s at a time without issues.  But, note that writing will
likely be much more expensive than a serialized list based approach.


Just a few thoughts.

Robert



On Tue, Apr 19, 2011 at 20:15, nischalshetty <nischalshett...@gmail.com> wrote:
> @Tim
>
> That seems like a new approach that I did not have in mind. I can put them
> into an unindexed text field. I however do need to search for a bunch of ids
> to see if they are present in the datastore for a given user. Using the
> unindexed approach, I would need to pull all the ids in memory and then
> search over them.
>
> Seems good but I've seen when you have hundreds of thousands of ids, the 1
> mb limit is breached. So if I'm not wrong, I would still need to have
> multiple rows when the user has more than a certain number of ids.
>
> As far as serializing and de-serializing costs are involved, in Brett
> Slatkin's Google IO talk ( http://goo.gl/1KVvX see page 21) he mentions that
> serialization overhead is something that is best avoided though I wouldn't
> mind it so much if it solves my purpose.
>
> My priority would be to find a datastore approach to solve my problem (of
> querying to check is a list of 100 ids are present for a particular user).
> If I'm not able to then I would probably go with the approach that you have
> mentioned.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to