Re: [google-appengine] datastore advice?

2010-10-27 Thread Ikai Lan (Google)
Generally speaking: no. Entity groups will guarantee a stronger chance of
data locality, but this should not affect index traversal or batch reads.

A best practice, however, is to keep entity groups as small as possible.
There aren't many compelling reasons to not use root entities if you don't
need transactions, as working with root entities is generally simpler.

--
Ikai Lan
Developer Programs Engineer, Google App Engine
Blogger: http://googleappengine.blogspot.com
Reddit: http://www.reddit.com/r/appengine
Twitter: http://twitter.com/app_engine



On Tue, Oct 26, 2010 at 3:26 PM, djidjadji  wrote:

> Parent and child objects are stored in the same Bigtable node. This is
> done for the transactions. A transaction works on a single
> entity-group.
> If you perform a query you use an index to find the objects needed. At
> this point there is no performance penalty for parent or child objects
> that match the query. The query results in a number of keys of objects
> that need to be retrieved. The best query responds time is when the
> objects
> to fetch are stored in as many Bigtable nodes as possible (parallel fetch).
>
> Why do you need so many child objects?
> Can you implement it with adding a Parent Reference Property to the
> child object and thus remove the need to store the entity group all in
> one Bigtable node?
>
> The parent-child objects are needed
> 1) if you need transactions on them
> or
> 2) if you want to extract the parent key given a child key
>Perform a keys_only query on child objects and from these keys get
> the set of parent keys of objects that you need to fetch complete.
>Brett Slatkin uses this technique in a number of Google IO talks,
> the child object has a ListProperty that is used in the keys_only
> query.
>
> Most other applications can be implemented without the explicit parent
> object.
>
> 2010/10/26 Charles :
> > Hi all,
> >
> > I'm wondering, since the datastore is hierarchical, does the number of
> > children an entity has affect the performance on querying on the parents
> > themselves?  For example, if I have a set of parents, say...
> >
> > Jane
> > Margaret
> > Graham
> > Arthur
> >
> > ...and I have a set of children associated with those parents...
> >
> > Jane
> >   -Sam
> >   -Robert
> > Margaret
> >   -Lisa
> > Graham
> > Arthur
> >   -Rowen
> >   -Jerry
> >
> > ...will the number of children for each parent affect the performance of
> > querying the parents themselves?  For instance, if I wanted to select all
> of
> > the parents (SELECT * FROM parents), that would be easy with the data
> > above.  But, since the datastore is hierarchical, does the performance
> get
> > hampered if say the parents have many thousands or even millions of
> > children?  Say, like...
> >
> > Jane
> >   -Sam
> >   -Robert
> >   ...1 million more
> > Margaret
> > ...
> >
> > If so, I'm just wondering if it would make more sense to make the
> children
> > root entities too, so as not to affect the performance of querying on the
> > parents.  Anyways, hope I've explained my question well enough.
> >
> > Thanks in advance!
> >
> >
> > Charles
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com
> .
> > For more options, visit this group at
> > http://groups.google.com/group/google-appengine?hl=en.
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] datastore advice?

2010-10-26 Thread djidjadji
Parent and child objects are stored in the same Bigtable node. This is
done for the transactions. A transaction works on a single
entity-group.
If you perform a query you use an index to find the objects needed. At
this point there is no performance penalty for parent or child objects
that match the query. The query results in a number of keys of objects
that need to be retrieved. The best query responds time is when the
objects
to fetch are stored in as many Bigtable nodes as possible (parallel fetch).

Why do you need so many child objects?
Can you implement it with adding a Parent Reference Property to the
child object and thus remove the need to store the entity group all in
one Bigtable node?

The parent-child objects are needed
1) if you need transactions on them
or
2) if you want to extract the parent key given a child key
Perform a keys_only query on child objects and from these keys get
the set of parent keys of objects that you need to fetch complete.
Brett Slatkin uses this technique in a number of Google IO talks,
the child object has a ListProperty that is used in the keys_only
query.

Most other applications can be implemented without the explicit parent object.

2010/10/26 Charles :
> Hi all,
>
> I'm wondering, since the datastore is hierarchical, does the number of
> children an entity has affect the performance on querying on the parents
> themselves?  For example, if I have a set of parents, say...
>
> Jane
> Margaret
> Graham
> Arthur
>
> ...and I have a set of children associated with those parents...
>
> Jane
>   -Sam
>   -Robert
> Margaret
>   -Lisa
> Graham
> Arthur
>   -Rowen
>   -Jerry
>
> ...will the number of children for each parent affect the performance of
> querying the parents themselves?  For instance, if I wanted to select all of
> the parents (SELECT * FROM parents), that would be easy with the data
> above.  But, since the datastore is hierarchical, does the performance get
> hampered if say the parents have many thousands or even millions of
> children?  Say, like...
>
> Jane
>   -Sam
>   -Robert
>   ...1 million more
> Margaret
> ...
>
> If so, I'm just wondering if it would make more sense to make the children
> root entities too, so as not to affect the performance of querying on the
> parents.  Anyways, hope I've explained my question well enough.
>
> Thanks in advance!
>
>
> Charles
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] datastore advice?

2010-10-26 Thread Charles
Hi all,

I'm wondering, since the datastore is hierarchical, does the number of
children an entity has affect the performance on querying on the
parents themselves? For example, if I have a set of parents, say...

Jane
Margaret
Graham
Arthur

...and I have a set of children associated with those parents...

Jane
-Sam
-Robert
Margaret
-Lisa
Graham
Arthur
-Rowen
-Jerry

...will the number of children for each parent affect the performance
of querying the parents themselves? For instance, if I wanted to select
all of the parents (SELECT * FROM parents), that would be easy with the
data above. But, since the datastore is hierarchical, does the
performance get hampered if say the parents have many thousands or even
millions of children? Say, like...

Jane
-Sam
-Robert
...1 million more
Margaret
...

If so, I'm just wondering if it would make more sense to make the
children root entities too, so as not to affect the performance of
querying on the parents. Anyways, hope I've explained my question well
enough.

Thanks in advance!


Charles

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Datastore Advice

2010-04-06 Thread Nick Johnson (Google)
Hi Tim,

I would suggest dividing articles into three categories (for each user):
1) Unread articles
2) Read articles
3) 'Unseen' articles.

The first two can be implemented by means of a 'UserArticle' entity that
represents the read/unread status of an article. I would suggest making them
child entities of a User entity. Category 3 exists to encompass articles
that were created after the last time the user fetched articles. Thus, the
process for getting a list of unread articles would go as follows:

- Get the timestamp of the most recent article seen by the user
- Fetch and return all articles more recent than that timestamp
  - For each returned article, add a UserArticle entry marking it as unread
- Fetch and return articles with UserArticle entries marking them as unread.


-Nick Johnson

On Tue, Apr 6, 2010 at 3:01 PM, timwhunt  wrote:

> Hi,
>
> I'm having trouble deciding the best way to use the datastore for a
> new app, so I am hoping I can get some advice.  I'm trying to decide
> the best way to keep my app fast and scalable.
>
> You can think of it as an RSS reader.  I'm trying to find a good way
> of keeping track of what articles each user has read so they can
> choose to see just the unread ones.  Here are the options I've thought
> of:
>
> Option A - One object for each article with a list property that holds
> the userIDs of all the users who have read it.  To get the list of
> unread articles for a user, I'd query with a filter that the list
> property != theUserID.  I'm starting to doubt this would work, as it
> sounds like the query would match if ANY list entry was != theUserID
> (rather than ALL list entries != theUserID.)  Also, adding userIDs to
> the list for that one object whenever a user reads the article might
> lead to contention.
>
> Option B - similar to Option A, but sharding the object for each
> article.  That is, instead of 1 object per article, have multiple
> (e.g., 10) and map each user to always use the same shard.  That might
> help the contention issue for marking which articles are read, but if
> Option A's query fundamentally won't work, the same problem exists
> here.
>
> Option C - Create a separate article object for each user.  So if I
> (am lucky to) have 10,000 users, there would be 10,000 objects for
> each article, with each object specific to one user and holding just
> their read status.  I think this would work, but it seems very
> wasteful.  It might also be a challenge to create all those objects
> (when articles are added or next time each user logs in), perhaps
> being a problem or the 30 second task limit.  I realize I could save
> some space by using objects that had only partial article info for
> each user (e.g., The article's key and any other properties for
> filtering), but then I think showing the list of unread articles would
> be a two step process (get the list of article keys, and then get the
> article content for display)
>
> Option D - One object per article, but separate objects to mark which
> have been read by individual users.  Since the Datastore is different
> than a relational database, I think getting a list of unread articles
> would be a two step process:  First get the list of articles, then get
> a list of all the read articles and remove them.
>
> So those are the best ideas I've come up with, but none seem very
> idea.  Any suggestions would be greatly appreciated.
>
> Thanks!
> Tim
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. ::
Registered in Dublin, Ireland, Registration Number: 368047
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Datastore Advice

2010-04-06 Thread timwhunt
Hi,

I'm having trouble deciding the best way to use the datastore for a
new app, so I am hoping I can get some advice.  I'm trying to decide
the best way to keep my app fast and scalable.

You can think of it as an RSS reader.  I'm trying to find a good way
of keeping track of what articles each user has read so they can
choose to see just the unread ones.  Here are the options I've thought
of:

Option A - One object for each article with a list property that holds
the userIDs of all the users who have read it.  To get the list of
unread articles for a user, I'd query with a filter that the list
property != theUserID.  I'm starting to doubt this would work, as it
sounds like the query would match if ANY list entry was != theUserID
(rather than ALL list entries != theUserID.)  Also, adding userIDs to
the list for that one object whenever a user reads the article might
lead to contention.

Option B - similar to Option A, but sharding the object for each
article.  That is, instead of 1 object per article, have multiple
(e.g., 10) and map each user to always use the same shard.  That might
help the contention issue for marking which articles are read, but if
Option A's query fundamentally won't work, the same problem exists
here.

Option C - Create a separate article object for each user.  So if I
(am lucky to) have 10,000 users, there would be 10,000 objects for
each article, with each object specific to one user and holding just
their read status.  I think this would work, but it seems very
wasteful.  It might also be a challenge to create all those objects
(when articles are added or next time each user logs in), perhaps
being a problem or the 30 second task limit.  I realize I could save
some space by using objects that had only partial article info for
each user (e.g., The article's key and any other properties for
filtering), but then I think showing the list of unread articles would
be a two step process (get the list of article keys, and then get the
article content for display)

Option D - One object per article, but separate objects to mark which
have been read by individual users.  Since the Datastore is different
than a relational database, I think getting a list of unread articles
would be a two step process:  First get the list of articles, then get
a list of all the read articles and remove them.

So those are the best ideas I've come up with, but none seem very
idea.  Any suggestions would be greatly appreciated.

Thanks!
Tim

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.