First, you should think about what the most common operation(s) will be in the application. Design so that the majority of the time, the 'other' operations perform well, but don't optimize for the 0.05% case; in other words, optimize for 99% of the cases.
The list-property limit of 5,000 applies to indexed lists, and is referring to the number of index rows for an entity. With the number of items you're expecting to store in the list, you should not use a list property; it would be quite inefficient. http://code.google.com/appengine/docs/python/datastore/overview.html#Quotas_and_Limits Tim's suggestion to serialize the data is great when you don't need to search the data. However, you do need to search the data so you'll need a solution for that. You could probably maintain some type of 'index' so that you know which 'list entity' a particular key would belong in; that way you easily know which shard of the list you should fetch and search. Should be very easy if you keep the list sorted. If you use a serialized list, you could compress it to squeeze more values / entity, but I think you'll still wind up sharding the list across entities. Another option you could consider may be a per-user bloom filter. So for lookups you could store a serialized bloom-filter index that could be quickly loaded and searched. Then you'll be able to decided for which ids you need to do the more expensive 'deep-search' operation. Store the actual list of ids in some way that is convient for the most common operation(s). http://en.wikipedia.org/wiki/Bloom_filter Also, having a lot of entities won't be an issue if you've designed your app to work well with your common operations. I've dealt with applications that have billions (yes, with a *B*) of entities and they perform very well. Particularly if the entities are small, you can fetch 100s at a time without issues. But, note that writing will likely be much more expensive than a serialized list based approach. Just a few thoughts. Robert On Tue, Apr 19, 2011 at 20:15, nischalshetty <nischalshett...@gmail.com> wrote: > @Tim > > That seems like a new approach that I did not have in mind. I can put them > into an unindexed text field. I however do need to search for a bunch of ids > to see if they are present in the datastore for a given user. Using the > unindexed approach, I would need to pull all the ids in memory and then > search over them. > > Seems good but I've seen when you have hundreds of thousands of ids, the 1 > mb limit is breached. So if I'm not wrong, I would still need to have > multiple rows when the user has more than a certain number of ids. > > As far as serializing and de-serializing costs are involved, in Brett > Slatkin's Google IO talk ( http://goo.gl/1KVvX see page 21) he mentions that > serialization overhead is something that is best avoided though I wouldn't > mind it so much if it solves my purpose. > > My priority would be to find a datastore approach to solve my problem (of > querying to check is a list of 100 ids are present for a particular user). > If I'm not able to then I would probably go with the approach that you have > mentioned. > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appengine@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.