Option1 can be improved substantially by doing.db.put([Tag1, Tag2,
Tag3,....])
which is in turn a single datastore put.

Option2 doesn't suffer from the assumption that the lists have to be
traversed.
In fact The ListProperty is indexed in a massively parallel fashion.  Each
value in the ListProperty is indexed separately.
The number of items in the list does not really impact the performance of
updating the index.

Option2 added benefit.
You can find entities that have certain tags.
GqlQuery("SELECT __key__ from Tags WHERE tags = :1 AND tags = :2 AND tags =
:3"  ,

This will ensure the entity has 1 and 2 and 3 as tags. This is done in a
distributed merge join fashion and is very fast compared to any alternative
way of determining this.

-Kevin


On Thu, Oct 8, 2009 at 2:00 PM, Shailen <shailen.t...@gmail.com> wrote:

>
> Started working on my first app-engine app and have a question about
> how to organize my data.
>
> Here's a highly simplified version of one part of my app:
>  for any url, the app generates a bunch of tags (about 100 - 300 tags
> per url, although most urls will
>  have tags in the lower part of the range).
>
> I am considering 2 options, but both have (I suspect) prohibitive
> downsides.
>
> Option 1, Relationship model:
>  create class TagUrl
>  to write:
>    TagUrl(tag=some_tag, url=some_url).put()
>    TagUrl(tag=some_other_tag, url=some_url).put()
>    etc., for each tag-url combo
>
> That is a *lot* of writes to the datastore for each url and will
> probably cause me to burn through my GAE quotas in a hurry. The reads
> - to find all urls that match a tag - should be reasonably efficient.
> So, high write costs, low read costs.
>
> Option 2, use list properties (derived from Brett Slatkin's Building
> Scalable, Complex app... talk):
>   create 2 classes, URL and Tags:
>     class URL(db.Model):
>       url = db.StringProperty()
>
>     class Tags(db.Model):
>       tags = db.StringListProperty()
>
>   to write:
>     store a list of tags with a url key as parent
>   to read:
>     indexes = GqlQuery("SELECT __key__ from Tags WHERE tags = :1" ,
> target_tag)
>     keys = [k.parent for k in indexes]
>     urls = db.get(keys)
>
> This option is better, but how efficient are the reads, really? For a
> 1000 urls, there will be a 1000 lists of tags which will have to be
> traversed to find the target_tag.  Assuming each list has 100 or so
> elements, that's an awful lot of reading. The problem worsens,
> obviously, for more urls (as is likely to be the case in my app) and
> longer lists.
>
> Anyone have a solution for this? I am leaning towards option 2, but
> worry about how scalable it really is.
>
> - Shailen Tuli
>
>
>
>
>
> >
>


-- 
Kevin Pierce
Software Architect
VendAsta Technologies Inc.
kpie...@vendasta.com
(306)955.5512 ext 103
www.vendasta.com

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to