On Wed, Aug 26, 2009 at 10:28 AM, Philippe <philippe.cr...@gmail.com> wrote:

>
> Nick, what about list deserialisation ?
> no impact on large data set with 30 elements list ?


There's a per-element cost to deserializing entities, yes - so larger lists
will take longer. 30 elements is not a particularly large list in the grand
scheme of things, however.

-Nick Johnson


>
>
> On Aug 26, 11:22 am, "Nick Johnson (Google)" <nick.john...@google.com>
> wrote:
> > Hi Gary,
> > The time a query takes to execute is proportional to the number of
> results
> > returned, not the number in the datastore. I suspect your tests have
> > returned more results when you had more entries (because there were more
> > matches, and you were asking for them all)?
> >
> > -Nick Johnson
> >
> >
> >
> > On Wed, Aug 26, 2009 at 1:44 AM, Gary <gbre...@gmail.com> wrote:
> >
> > > Thanks for your help Philippe,
> >
> > > So does anyone have an idea of how to store this data in a scalable
> > > manner? I was sure Woobles first answer was going to be the fix for
> > > this but it appears it doesn't scale CPU time is increasing to 3
> > > seconds with just 32,000 records - what will it be about 1 million
> > > records?
> >
> > > Would be interested to hear from anyone who has successfully stored
> > > and queried a large dataset on appengine.
> >
> > > Thanks,
> >
> > > Gary
> >
> > > On Aug 26, 12:16 am, Philippe <philippe.cr...@gmail.com> wrote:
> > > > sad, I did not think about that :/
> >
> > > > On Aug 25, 2:49 pm, Gary <gbre...@gmail.com> wrote:
> >
> > > > > Hi,
> >
> > > > > This solution doesn't seemto work because
> >
> > > > > ". in addition, the key_name of a keyword must be the actual
> > > > > keyword" isn't possible as I will have duplicate keywords and the
> key
> > > > > name is actually just used as part of the key so data gets
> converted
> > > > > to a string such as
> "gZid29zZW1yGQsSBkRvbWFpbiINYnVpbHR3aXRoLmNvbQw"
> > > > > as a key value.
> >
> > > > > Gary
> >
> > > > > On Aug 25, 9:18 pm, Philippe <philippe.cr...@gmail.com> wrote:
> >
> > > > > > another option could be:
> > > > > > Record(db.Model):
> > > > > >     value = db.StringProperty()
> >
> > > > > > Keyword(db.Model):
> > > > > >     value = db.StringProperty() #this value is not necessary, but
> I
> > > do
> > > > > > not know if you can have a Model without properties
> >
> > > > > > the idea is that for one Record, you input several keywords has
> > > record
> > > > > > children. in addition, the key_name of a keyword must be the
> actual
> > > > > > keyword
> > > > > > then, you can simply get_by_key_name() the keyword. with a
> > > > > > keys_only=true.
> > > > > > and thus, ask the key.parent() to get the correct Record.key().
> > > > > > finally, db.get(that reccord key).
> >
> > > > > > a db.get() takes less than 100ms. I read somewhere that a
> get(keys)
> > > is
> > > > > > always faster and will not get slower if your number of entities
> > > > > > increase.
> >
> > > > > > On Aug 25, 11:52 am, Gary <gbre...@gmail.com> wrote:
> >
> > > > > > > Great,
> >
> > > > > > > This is how I've done it, I've used the filter to do some
> testing
> > > and
> > > > > > > with only 32,424 values and appoximately 500,000 keywords a
> search
> > > is
> > > > > > > taking 3000ms CPU - will this stay the same as my datastore
> expands
> > > to
> > > > > > > 4.5 million values and potentially 100 million keywords?
> >
> > > > > > > Gary
> >
> > > > > > > On Aug 18, 10:42 pm, Wooble <geoffsp...@gmail.com> wrote:
> >
> > > > > > > > You can do it like (in python):
> >
> > > > > > > > Record(db.Model):
> > > > > > > >     value = db.StringProperty()
> > > > > > > >     keywords = db.StringListProperty()
> >
> > > > > > > > No IN query is necessary to find values with a specific
> keyword
> > > or
> > > > > > > > even multiple keywords, just do:
> > > > > > > > filteredquery = Record.all().filter("keywords =",
> > > "mykeyword").filter
> > > > > > > > ("keywords =", "myotherkeyword")...
> >
> > > > > > > > You can basically add as many .filter()s as you want thanks
> to
> > > merge
> > > > > > > > join.  Normalizing would require an exploding index if you
> ever
> > > want
> > > > > > > > to search records having more than 1 given keyword. You
> might,
> > > > > > > > however, want to optimize a bit for space by saving lists of
> > > integers
> > > > > > > > and mapping those integers to keywords, although you'll end
> up
> > > with a
> > > > > > > > performance hit when you search and storage is relatively
> cheap.
> >
> > > > > > > > On Aug 18, 2:26 am,garazy<gbre...@gmail.com> wrote:
> >
> > > > > > > > > Hi,
> >
> > > > > > > > > I want to store 4.5 million data records that have a string
> > > identifier
> > > > > > > > > which each have 50 keywords associated with them.
> >
> > > > > > > > > For example,
> >
> > > > > > > > > value (string), keyword1 (string) ... keyword50 (string)
> >
> > > > > > > > > In SQL Server 2008 in a traditional database the best way,
> that
> > > I
> > > > > > > > > found, was to store this is 3NF such as
> >
> > > > > > > > > value_id (int)
> > > > > > > > > value (string)
> > > > > > > > > |
> > > > > > > > > |
> > > > > > > > > value_id (int)
> > > > > > > > > keyword_id (int)
> > > > > > > > > |
> > > > > > > > > |
> > > > > > > > > keyword_id (int)
> > > > > > > > > keyword_value (string)
> >
> > > > > > > > > The data is mostly static, I typically use queries where to
> > > find
> > > > > > > > > values which use specific keywords, which in SQL uses IN
> > > queries,
> > > > > > > > > however I'm not sure if this is the best approach for the
> app
> > > engine
> > > > > > > > > datastore.
> >
> > > > > > > > > Do people recommend the data stored as-
> >
> > > > > > > > > value, keyword0, keyword1, keyword2 etc..
> >
> > > > > > > > > or in the same sort of way as I store it in SQL Server?
> >
> > > > > > > > > What would the performance be using the app engine be for
> this
> > > store
> > > > > > > > > of datastore if I wanted to find values which have say 2
> > > specific
> > > > > > > > > keywords?
> >
> > > > > > > > > Thanks,
> >
> > > > > > > > > Gary
> >
> > --
> > Nick Johnson, Developer Programs Engineer, App Engine
> >
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to