[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread boson

Tzakie,

A few things:

1. What is the shape of your data?  i.e. how big are your entities and
what do they contain?

2. How are you using entity groups?  Is all your data in a single
entity group (shared ancestor)?

3. Also try to experiment with permutations of your index and query to
try to isolate the bottleneck.  i.e. if you just fetch(500) on "SELECT
* FROM USER_INVENTORY", does it still take as long?  (and if so, then
rephrase the question to eliminate the index aspects)



On Jan 8, 10:48 am, Tzakie  wrote:
> I've spend alot of time profiling and optimizing my app.
> I've watched all the google io videos and done a ton of reading...
>
> But there is one query I can't optimize away:
>
> SELECT * FROM USER_INVENTORY WHERE TUSER_ID=637 AND CATEGORY_ID=2752
> ORDER BY ITEM_NUMBER ASC, ITEM_NAME, SUFFIX_NAME
>
> Yes there is a custom index for it
>
> It looks like a simple enough query, but it's taking the datastore
> 2.170703-1.863699 seconds to load and get about 500 records. The app
> engine translates 2 real seconds into 20 'processor seconds' and of
> course various quotas all go crazy. It's a simple list of a web stores
> items in a category any more than 50 or so and there is a noticeable
> delay.
>
> Why it it taking the datastore so long to return something like this?
>
> Entities seem to take forever to retrieve.
>
> Questions:
> I assume it's the number of returned entities here that causes the
> delay? E.G. The big penalty is for returning an entity not the query.
>
> Once you have a custom index the trailing ORDER BY items ITEM_NAME,
> SUFFIX_NAME don't matter much for speed since the key just gets a
> little longer.
>
> Do the number of properties on the entity matter much for query time?
>
> Is there some way to return JUST the properties I want? Whould that
> help for speed?
>
> Is there some way to get the just keys back from a query and memcache
> the objects?
>
> Though the queries are limited at 1,000 which is reasonable. The data
> store is so slow you can't get more than about 50 records on the
> screen at once. It seems like there is a fundamental limit, E.G. If
> you want a list of 500 items app engine just can't do it not matter
> what the technique.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Tzakie


> 1. What is the shape of your data?  
> i.e. how big are your entities and what do they contain?

class USER_INVENTORY(db.Model):
TUSER_ID = db.IntegerProperty()
ITEM_TYPE = db.StringProperty()
ITEM_ID = db.IntegerProperty()
SUFFIX_ID = db.IntegerProperty()
HAVE_INVENTORY = db.IntegerProperty()
SELL_INVENTORY = db.IntegerProperty()
SELL_RESERVED = db.IntegerProperty()
SELL_AVAILABLE = db.IntegerProperty()
SELL_PRICE = db.FloatProperty()
SELL_IDEAL_INVENTORY = db.IntegerProperty()
BUY_INVENTORY = db.IntegerProperty()
BUY_RESERVED = db.IntegerProperty()
BUY_AVAILABLE = db.IntegerProperty()
BUY_PRICE = db.FloatProperty()
SHIPPING_VALUE = db.FloatProperty()
SELL_RANKING = db.IntegerProperty()
VERSION = db.IntegerProperty()
CATEGORY_ID = db.IntegerProperty() #
L0_CATEGORY_ID = db.IntegerProperty()
L1_CATEGORY_ID = db.IntegerProperty()
L2_CATEGORY_ID = db.IntegerProperty()
L3_CATEGORY_ID = db.IntegerProperty()
L4_CATEGORY_ID = db.IntegerProperty()
L5_CATEGORY_ID = db.IntegerProperty()
L6_CATEGORY_ID = db.IntegerProperty()
L7_CATEGORY_ID = db.IntegerProperty()
L8_CATEGORY_ID = db.IntegerProperty()
L9_CATEGORY_ID = db.IntegerProperty()
INVENTORY_TYPE = db.StringProperty() # G-Global C-Custom A-Auction
P-Pointer
INVENTORY_NAME = db.StringProperty()
ITEM_NUMBER = db.StringProperty()
ITEM_NAME = db.StringProperty()
SUFFIX_NAME = db.StringProperty()
ATTR_Set = db.StringProperty()
ATTR_Game = db.StringProperty()
ATTR_Rarity = db.StringProperty()
ATTR_Language = db.StringProperty()
ATTR_Style = db.StringProperty()
ATTR_Condition = db.StringProperty()
ATTR_Edition = db.StringProperty()

This is really the minimum you need to describe an inventory object.
If the number of properties is
the problem can I some how just return a subset of the properties I
need in the query?

> 2. How are you using entity groups?  Is all your data in a single
> entity group (shared ancestor)?
Not at all.

> 3. Also try to experiment with permutations of your index and query to
> try to isolate the bottleneck.  i.e. if you just fetch(500) on "SELECT
> * FROM USER_INVENTORY", does it still take as long?  (and if so, then
> rephrase the question to eliminate the index aspects)

Changing to SELECT * FROM USER_INVENTORY doesn't help
it's the loop where the entities are being fetched. Is there some way
to
not return all the properties? Can I just get the keys that match the
query?

I remember some one in the goolge io videos saying not to sort or
filter
in memory. So I was trying to stick with making indexes do the work.
After looking at this setup I think sorting in memory with memcached
objects might be the only way to get anything done. That's rewriting
a database in python and using bigtable as nothing but a file store.

Also I can't see logs anymore:
HTTP response was too large: 1808121. The limit is: 1048576.
Clear logs please.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Tzakie

Looking at it more deeply every 20th one takes a long time. I assume
that's the data fetch.

Still need a way to optimize...
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Dan Sanderson
If you only need some of the properties for the query that needs 100+
results, you'll need to create a separate set of entities with just those
properties, and query those.  Similarly, if you want the query to return
just the keys, you'll need entities containing the properties that are the
subjects of query filters and the keys for the full entities.

100+ entities in a single request is a lot, especially with 40 properties on
each entity.  Smaller entities will get() faster, but you might also
consider avoiding needing so many results at once. If you're hitting request
timeouts and really need that much data, you could spread the requests
across multiple requests using JavaScript.  This won't reduce the total user
time for the complete result set, but you could reduce perceived latency by
displaying the first 20 results immediately while the remaining 80 are
fetched.

If you're trying to deliver the results all at once like in a downloadable
spreadsheet, you'll have to get clever, maybe use memcache as a workspace
and build it over multiple requests.

-- Dan

On Thu, Jan 8, 2009 at 1:07 PM, Tzakie  wrote:

>
> Looking at it more deeply every 20th one takes a long time. I assume
> that's the data fetch.
>
> Still need a way to optimize...
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Tzakie

>Similarly, if you want the query to return
>just the keys, you'll need entities containing the properties that are the
>subjects of query filters and the keys for the full entities.

so there is no way to

QueryString="SELECT * FROM USER_INVENTORY WHERE TUSER_ID=432 AND
CATEGORY_ID=23423"
CategoryRows = db.GqlQuery(QueryString)

KeyNames=CategoryRows.getKeyNames()

Without creating separate entities?

That's crazy. There must be a way to dig the identifiers out of the
db.GqlQuery() return.

Unless you can get the keys out and memcache I'm beginning to think
beyond trivial
applications app engine is just unusable. I've put so much time into
this some expert
person help me out here please.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread boson

BigTable is an object store, so you can't ask it for specific fields
like you can in a traditional database.  In the future we may get new
tools from Google that operate in cloud data and return a result
(MapReduce, etc.), but not yet.

You might try (like Dan said) breaking your entities up into smaller
parts, and using a common key_name format to identify parts.  I think
if you loaded a few hundred entities with just (e.g.) USER_INVENTORY,
TUSER_ID, ITEM_NAME, and ITEM_NUMBER, it would go much quicker.

Sorry this isn't what you wanted to hear.  Maybe somebody else has a
better answer.

On Jan 8, 2:00 pm, Tzakie  wrote:
> >Similarly, if you want the query to return
> >just the keys, you'll need entities containing the properties that are the
> >subjects of query filters and the keys for the full entities.
>
> so there is no way to
>
> QueryString="SELECT * FROM USER_INVENTORY WHERE TUSER_ID=432 AND
> CATEGORY_ID=23423"
> CategoryRows = db.GqlQuery(QueryString)
>
> KeyNames=CategoryRows.getKeyNames()
>
> Without creating separate entities?
>
> That's crazy. There must be a way to dig the identifiers out of the
> db.GqlQuery() return.
>
> Unless you can get the keys out and memcache I'm beginning to think
> beyond trivial
> applications app engine is just unusable. I've put so much time into
> this some expert
> person help me out here please.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Dan Sanderson
There is currently no way to retrieve only parts of entities, nor just keys,
from the datastore in response to queries.  There's no way to dig out the
keys either: the datastore returns the full entities in response to queries,
there is no intermediate app-side step that fetches entities for keys.
The datastore is designed with the expectation that the complete entity is
needed as the result of a query.  If you're storing many facts about a
notional thing (such as a product in a catalog), and you only need some of
those facts back in response to a query, you may want to represent the thing
using multiple entities.  Depending on the nature of the queries, the
entities for a thing may store duplicate information.  You can manage this
duplication using the Python class API and transactions.

For what it's worth, I was once an engineer in the catalog department of a
major online retailer, and we used a similar technique for "fast" product
data, which showed up in search and browser results and the like, and "slow"
product data which was only needed on the product page.

-- Dan

On Thu, Jan 8, 2009 at 2:00 PM, Tzakie  wrote:

>
> >Similarly, if you want the query to return
> >just the keys, you'll need entities containing the properties that are the
> >subjects of query filters and the keys for the full entities.
>
> so there is no way to
>
> QueryString="SELECT * FROM USER_INVENTORY WHERE TUSER_ID=432 AND
> CATEGORY_ID=23423"
> CategoryRows = db.GqlQuery(QueryString)
>
> KeyNames=CategoryRows.getKeyNames()
>
> Without creating separate entities?
>
> That's crazy. There must be a way to dig the identifiers out of the
> db.GqlQuery() return.
>
> Unless you can get the keys out and memcache I'm beginning to think
> beyond trivial
> applications app engine is just unusable. I've put so much time into
> this some expert
> person help me out here please.
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Alexander Kojevnikov

> Looking at it more deeply every 20th one takes a long time. I assume
> that's the data fetch.
>
I guess you are iterating over a Query or GqlQuery object to get the
entities? This explains explain why every 20th iteration. From
http://code.google.com/appengine/docs/datastore/queryclass.html :

  The iterator retrieves results from the datastore in small batches,
allowing for the app to stop iterating on results to avoid fetching
more than is needed.

If you know beforehand how many entities you need, try using the fetch
() method and iterate over an array of entities it returns.

This may be a bit faster, but still, your problem is the size of your
entities and their number. Try reducing either or both of them.

Dan and boson already covered the size part, regarding the number, do
you really need 500 entities at once? Unless you are going to
calculate some aggregate values, your users won't be able to digest so
much information on a page.

Cheers,
Alex
--
www.muspy.com
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Tzakie

Hey Dan and Boson,

Thanks very much for your time guys.

When you call:
CategoryRows = db.GqlQuery(QueryString)
It's fast.

It's the loop where you try and access the returned entities that
is slow. Doesn't it follow that the CategoryRows has the keys in it
and the next() in the python loop is fetching them? every 20th
next() call is also takes extra time. Like it's fetching 20 at a
time from some place.

I don't think the problem is the aspects of BigTable. The thing
is just 10 times too slow. It needs to be able to load about
1,000 records at max too make the returned record count a
non-issue. You need about 50 properties to make the property
count a non-issue. The thing needs to return 1,000 entities or
50,000 properties in 0.5 seconds then the problem goes away.

You can cache them for the next user. We just need it to go
off once.

Right now it takes 3 seconds upgraded to '30 processor seconds'
and the thing flips out.

To the smart guys at google. Putting a couple hundred inventory
lines in front of a user seems like something the app engine should
do.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Dan Sanderson
On Thu, Jan 8, 2009 at 3:35 PM, Tzakie  wrote:

>
> Thanks very much for your time guys.
>

No problem, happy to help.


> When you call:
> CategoryRows = db.GqlQuery(QueryString)
> It's fast.
>
> It's the loop where you try and access the returned entities that
> is slow. Doesn't it follow that the CategoryRows has the keys in it
> and the next() in the python loop is fetching them? every 20th
> next() call is also takes extra time. Like it's fetching 20 at a
> time from some place.


Actually, db.GqlQuery() (and db.Query() and Model.all() and Model.gql())
only sets up the query, it does not execute it.  The query is executed when
you retrieve the results using q.fetch(), q.get(), or the iterator interface
(for result in q: ...).

We definitely hear your concerns about the desired scale of single
operations, and we're always working to improve these numbers.  But it's
worth considering that App Engine is a scalable system intended to offer the
same performance characteristics whether your data set has thousands of
items or millions of items.  To do that, it has to behave differently than a
single-server relational database.

Quite frankly, I can't think of a Google web app that displays 100 of
anything all at once...

-- Dan

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Tzakie


> Quite frankly, I can't think of a Google web app that displays 100 of
> anything all at once...

I'm getting the impression that people think what I'm asking for is
ridiculous
and off the radar. I sent you an e-mail with the urls of the current
app and
what I am working on. When you see it in context I think it looks
pretty
reasonable.

E commerce apps particularly need long lists for a lot of things. On
the
e-commerce apps I do paging kills customers. They don't "next" they
just leave.

Can't you guys make something that just returns the keys from a query?
That seems consistent with how I think big table works.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Alex Popescu



On Jan 9, 4:15 am, Tzakie  wrote:
> [...]
>
> Can't you guys make something that just returns the keys from a query?
> That seems consistent with how I think big table works.

I'm pretty sure there should be a solution for this as BigTable is
basically a distributed hashmap, so fetching only the key set should
not be a real problem. Now, I don't know the details of the
implementation, but there might be another implementation where the
entities are just serialized and the engine is using some surrogate
keys (that are not visible to the app).

./alex
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Alex Popescu



On Jan 9, 1:22 am, Alexander Kojevnikov 
wrote:
> > Looking at it more deeply every 20th one takes a long time. I assume
> > that's the data fetch.
>
> I guess you are iterating over a Query or GqlQuery object to get the
> entities? This explains explain why every 20th iteration. 
> Fromhttp://code.google.com/appengine/docs/datastore/queryclass.html:
>
>   The iterator retrieves results from the datastore in small batches,
> allowing for the app to stop iterating on results to avoid fetching
> more than is needed.
>

That's a very interesting bit that I've missed while reading the docs,
but it is quite consistent with the statistics I've noticed. Anyways,
from my tests any result set that goes beyond 20-30 results is quickly
degrading the performance.

./alex
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-08 Thread Dan Sanderson
Repeating a couple of things I mentioned to Greg in email, for the list
discussion:
A single-server SQL database with low load and tables with rows numbering in
the thousands can do lots of cool things with that data in a reasonable
amount of time.  But many of the features of a SQL query engine have
difficulty scaling to hundreds of machines, millions of records, and 50+
qps.  The App Engine datastore is designed to scale, and inevitably lacks
features of single-server SQL databases that don't scale.  The datastore is
designed to maintain its performance characteristics over very large data
sets and heavy traffic, so performance with small data sets and low traffic
may not always compare to a single-server SQL database under similar
conditions. in the same sense that it won't compare to a RAM cache of the
same data.
I'll be the first to admit that this means some applications are not
well-suited to App Engine.  However, I wouldn't say that a table of 300
items of product data is necessarily impractical, even if it isn't as simple
as rendering the results of a SQL query.  One simple option in this case is
to store an entity per item as you're doing now, but use range queries and
multiple requests driven by JavaScript to build the page.  (Watch how Google
Reader renders lists of hundreds of items; you can barely tell the data
wasn't all there on the first request.)

Going to the primary database to retrieve 300 items of 40 fields each is
asking for 12,000 instantly dynamic fields for a single page.  That's a
*lot* to ask if the app expects to scale.  Part of scaling data is
determining how quickly updates need to be available, so you can
pre-calculate and cache results to optimize queries, i.e. make data that
doesn't need to be instantly dynamic less dynamic.

In the case of a product catalog, most of the catalog data can be largely
static.  You can pre-build and store the product lists, then store dynamic
data (like product availability) on separate entities queried at render
time.  The upcoming cron service will make this kind of pre-building and
cache warm-up easier to do, and in some cases you can just update lists and
caches when you update individual fields in primary storage.  You have the
right idea with wanting to cache this information, but you'll probably need
to do something other than performing the 12,000-field query during a user
request with the intent to cache.

Incidentally, returning just keys for a query has been on our radar for a
while, we just haven't gotten to it.  I can't promise anything, but it's on
our list.  Feel free to file a feature request in the issue tracker to
promote it.

-- Dan

On Thu, Jan 8, 2009 at 6:15 PM, Tzakie  wrote:

>
>
> > Quite frankly, I can't think of a Google web app that displays 100 of
> > anything all at once...
>
> I'm getting the impression that people think what I'm asking for is
> ridiculous
> and off the radar. I sent you an e-mail with the urls of the current
> app and
> what I am working on. When you see it in context I think it looks
> pretty
> reasonable.
>
> E commerce apps particularly need long lists for a lot of things. On
> the
> e-commerce apps I do paging kills customers. They don't "next" they
> just leave.
>
> Can't you guys make something that just returns the keys from a query?
> That seems consistent with how I think big table works.
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-09 Thread JC

Tzakie,

As Dan pointed out, GqlQuery isn't actually fetching the data when
constructed.  Try this instead:

CategoryRows = db.GqlQuery(QueryString)
results = CategoryRows.fetch(limit)

where limit is the max number of rows to fetch. I believe this will
make a single trip the datastore rather than once every 20 objects
(which is done to make using intrinsic iteration of a GqlQuery
performant).  I'd be curious to see what, if any, gain this yields
with your dataset.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-09 Thread Tzakie


> I believe this will
> make a single trip the datastore rather than once every 20 objects
> (which is done to make using intrinsic iteration of a GqlQuery
> performant).  I'd be curious to see what, if any, gain this yields
> with your dataset.

Takes exactly the same amount of time. I read through a lot
of the google db libraries seeing if I could dig the keys out
of the objects. I think fetch just does the same loop for
you internally.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-09 Thread Tzakie

> The datastore is
> designed to maintain its performance characteristics over very large data
> sets and heavy traffic, so performance with small data sets and low traffic
> may not always compare to a single-server SQL database under similar
> conditions. in the same sense that it won't compare to a RAM cache of the
> same data.

I not expecting it to work the same as sql. But I do expect to build a
list
of a couple hundred records. I'm not crazy this is a totally
reasonable thing
for an app to do. Sorry guys the serial return speed of these entity
loaded
queries is an issue. It's not me it's you. And it's just a couple
changes away
from working. Can't change it myself so I'm at your mercy google.

Can we allow your libraries to spawn threads? SimpleDB solves this by
allowing you to bang it with many threads at once. Even though the
fetch
is slow for the entities since you can do so many in parallel it works
out.

That way you could:

Keys=QueryObject.getKeys()
NeededEntities=[]
for Key in Keys:
  #if key is not in memcache add to NeededEntities

DictonaryOfReturnedEntities=Model.fetchMassEntities
(NeededEntities,Threads=30)

Now I want .fetchMassEntities(NeededEntities,Threads=30) to kick 30
threads off
and get my entities. Which I will cache. This should be totally
workable for lists
up to 1,000 I would think. It's not a sql vs BigTable think it's a
architecture issue
with the current setup.

> Part of scaling data is
> determining how quickly updates need to be available, so you can
> pre-calculate and cache results to optimize queries, i.e. make data that
> doesn't need to be instantly dynamic less dynamic.

If I make a degenerate table model just the info I want to query and
the
key stored is it not just the same number of entity fetches right? I
am I getting
punished by the number or returned entities? or the number of
properties
returned?

> Incidentally, returning just keys for a query has been on our radar for a
> while, we just haven't gotten to it.  I can't promise anything, but it's on
> our list.  Feel free to file a feature request in the issue tracker to
> promote it.

Will do. Thanks for your time Dan. :)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Is the Datastore just too slow for 100+ returned records?

2009-01-09 Thread bowman.jos...@gmail.com

For the record, I regularly pull 100 entities, that are much smaller
than yours, for my application in order to page through them.
Basically to meet a paging requirement I pull 100, cache that result,
then page within it. I do think they size of your entities are part of
the problem

Though, I did find with the same models I pull 100 of for viewing, I
had to drop down to 75 for deletion. Even pulling 75 to delete I tend
to run into high CPU and datastore timeouts on many requests. Overall,
bulk management within the datastore is very difficult.

On Jan 9, 1:22 pm, Tzakie  wrote:
> > The datastore is
> > designed to maintain its performance characteristics over very large data
> > sets and heavy traffic, so performance with small data sets and low traffic
> > may not always compare to a single-server SQL database under similar
> > conditions. in the same sense that it won't compare to a RAM cache of the
> > same data.
>
> I not expecting it to work the same as sql. But I do expect to build a
> list
> of a couple hundred records. I'm not crazy this is a totally
> reasonable thing
> for an app to do. Sorry guys the serial return speed of these entity
> loaded
> queries is an issue. It's not me it's you. And it's just a couple
> changes away
> from working. Can't change it myself so I'm at your mercy google.
>
> Can we allow your libraries to spawn threads? SimpleDB solves this by
> allowing you to bang it with many threads at once. Even though the
> fetch
> is slow for the entities since you can do so many in parallel it works
> out.
>
> That way you could:
>
> Keys=QueryObject.getKeys()
> NeededEntities=[]
> for Key in Keys:
>   #if key is not in memcache add to NeededEntities
>
> DictonaryOfReturnedEntities=Model.fetchMassEntities
> (NeededEntities,Threads=30)
>
> Now I want .fetchMassEntities(NeededEntities,Threads=30) to kick 30
> threads off
> and get my entities. Which I will cache. This should be totally
> workable for lists
> up to 1,000 I would think. It's not a sql vs BigTable think it's a
> architecture issue
> with the current setup.
>
> > Part of scaling data is
> > determining how quickly updates need to be available, so you can
> > pre-calculate and cache results to optimize queries, i.e. make data that
> > doesn't need to be instantly dynamic less dynamic.
>
> If I make a degenerate table model just the info I want to query and
> the
> key stored is it not just the same number of entity fetches right? I
> am I getting
> punished by the number or returned entities? or the number of
> properties
> returned?
>
> > Incidentally, returning just keys for a query has been on our radar for a
> > while, we just haven't gotten to it.  I can't promise anything, but it's on
> > our list.  Feel free to file a feature request in the issue tracker to
> > promote it.
>
> Will do. Thanks for your time Dan. :)
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---