Any application requires fetching an unbounded amount of data for a
single page view is not scalable, no matter what technology you use to
build it, so this problem is not appengine specific.

If you need aggregations (average, median, total, etc), you have to
compute them incrementally or with an off-line process.

> when even with the "datetime <=" you still get a big set, how you can
> handle it?

We're talking about paging through a dataset, presenting n (for small
n) elements at a time to a user.

If we're paging through by the value of field with distinct values and
we want to present 20 results per page, the query for the first page
is "order by field" with limit 20.  That query has a "last" result.
The query for the next page is "field > {last result's field value}
order by field", again with limit 20.  That query also has a last
result so the form of subsequent queries should be obvious.  (If
you've got other conditions, such as user id or key, you need to add
those as well.)

Suppose that entities can have the same field value.  If you don't
care how those entities are ordered, the first query's order by clause
can be "order by field, __key__", again limit 20.  The next query
tries to pick up entities with the same field as the last result from
the previous query.  It looks like "field = {last result's field's
value} and __key__ > {last result's key} order by __key__" and you
keep using it until it fails.  You then use a query like the "next
page" query from the previous case.  (I stopped mentioning limit
because the value depends on what you need to fill the current page.)



On Dec 22, 8:50 pm, ajaxer <calid...@gmail.com> wrote:
> when even with the "datetime <=" you still get a big set, how you can
> handle it?
> for example you get 10000 item with the most specific filtering sql.
> and on this filtering sql, you should have a statistic info. like how
> many item it is .
>
> how do you expect the appengine to handle this problem?
> how about at one request with many these actions?
>
> On Dec 21, 11:09 pm, Andy Freeman <ana...@earthlink.net> wrote:
>
>
>
> > What statistics are you talking about?
>
> > You're claiming that one can't page through an entity type without
> > fetching all instances and sorting them.  That claim is wrong because
> > the order by constraint does exactly that.
>
> > For example, suppose that you want to page through by a date/time
> > field named "datetime".  The query for the first page uses order by
> > datetime while queries for subsequent pages have a "datetime <="
> > clause for the last datetime value from the previous page and continue
> > to order by datetime.
>
> > What part of that do you think doesn't work?
>
> > Do you think that Nick was wrong when he said that time time to
> > execute such query depends on the number of entities?
>
> > You can even do random access by using markers that are added/
> > maintained by a sequential process like the above.
>
> > On Dec 20, 7:34 pm, ajaxer <calid...@gmail.com> wrote:
>
> > > You misunderstand.
> > > if not show me a site with statistics on many fields.
> > > with more than 1000 pages please.
> > > thanks.
>
> > > On Dec 21, 9:06 am, Andy Freeman <ana...@earthlink.net> wrote:
>
> > > > You misunderstand.
>
> > > > If you have an ordering based on one or more indexed properties, you
> > > > can page efficiently wrt that ordering, regardless of the number of
> > > > data items.  (For the purposes of this discussion, __key__ is an
> > > > indexed property, but you don't have to use it or can use it just to
> > > > break ties.)
>
> > > > If you're fetching a large number of items and sorting so you can find
> > > > a contiguous subset, you're doing it wrong.
>
> > > > On Dec 19, 10:26 pm, ajaxer <calid...@gmail.com> wrote:
>
> > > > > obviously, if you have to page a data set more than 50000 items which
> > > > > is not ordered by __key__,
>
> > > > > you may find that the __key__  is of no use, because the filtered data
> > > > > is ordered not by key.
> > > > > but by the fields value, and for that reason you need to loop query as
> > > > > you may like to do.
>
> > > > > but you will encounter a timeout exception before you really finished
> > > > > the action.
>
> > > > > On Dec 19, 8:26 am, Andy Freeman <ana...@earthlink.net> wrote:
>
> > > > > > > > if the type of data is larger than 10000 items, you need 
> > > > > > > > reindexing
> > > > > > > for this result.
> > > > > > > and recount each time for getting the proper item.
>
> > > > > > What kind of reindexing are you talking about.
>
> > > > > > Global reindexing is only required when you change the indices in
> > > > > > app.yaml.  It doesn't occur when you add more entities and or have 
> > > > > > big
> > > > > > entities.
>
> > > > > > Of course, when you change an entity, it gets reindexed, but that's 
> > > > > > a
> > > > > > constant cost.
>
> > > > > > Surely you're not planning to change all your entities fairly often,
> > > > > > are you?  (You're going to have problems if you try to maintain
> > > > > > sequence numbers and do insertions, but that doesn't scale anyway.)
>
> > > > > > > > it seems you have not encountered such a problem.
> > > > > > > on this situation, the indexes on the fields helps nothing for the
> > > > > > > bulk of  data you have to be sorted is really big.
>
> > > > > > Actually I have.  I've even done difference and at-least-#
> > > > > > (intersection and union are special cases - at-least-# also handles
> > > > > > majority), at-most-# (binary xor is the only common case that I came
> > > > > > up with), and combinations thereof on paged queries.
>
> > > > > > Yes, I know that offset is limited to 1000 but that's irrelevant
> > > > > > because the paging scheme under discussion doesn't use offset.  It
> > > > > > keeps track of where it is using __key__ and indexed data values.
>
> > > > > > On Dec 16, 7:56 pm, ajaxer <calid...@gmail.com> wrote:
>
> > > > > > > of course the time is related to the type data you are fetching 
> > > > > > > by one
> > > > > > > query.
>
> > > > > > > if the type of data is larger than 10000 items, you need 
> > > > > > > reindexing
> > > > > > > for this result.
> > > > > > > and recount each time for getting the proper item.
>
> > > > > > > it seems you have not encountered such a problem.
> > > > > > > on this situation, the indexes on the fields helps nothing for the
> > > > > > > bulk of  data you have to be sorted is really big.
>
> > > > > > > On Dec 17, 12:20 am, Andy Freeman <ana...@earthlink.net> wrote:
>
> > > > > > > > > it still can result in timout if the data is really big
>
> > > > > > > > How so?  If you don't request "too many" items with a page 
> > > > > > > > query, it
> > > > > > > > won't time out.  You will run into 
> > > > > > > > runtime.DeadlineExceededErrors if
> > > > > > > > you try to use too many page queries for a given request, but 
> > > > > > > > ....
>
> > > > > > > > > of no much use to most of us if we really have big data to 
> > > > > > > > > sort and
> > > > > > > > > page.
>
> > > > > > > > You do know that the sorting for the page queries is done with 
> > > > > > > > the
> > > > > > > > indexing and not user code, right?  Query time is independent 
> > > > > > > > of the
> > > > > > > > total amount of data and depends only on the size of the result 
> > > > > > > > set.
> > > > > > > > (Indexing time is constant per inserted/updated entity.)
>
> > > > > > > > On Dec 16, 12:13 am, ajaxer <calid...@gmail.com> wrote:
>
> > > > > > > > > it is too complicated for most of us.
> > > > > > > > > and it still can result in timout if the data is really big
>
> > > > > > > > > of no much use to most of us if we really have big data to 
> > > > > > > > > sort and
> > > > > > > > > page.
>
> > > > > > > > > On Dec 15, 11:35 pm, Stephen <sdea...@gmail.com> wrote:
>
> > > > > > > > > > On Dec 15, 8:04 am, ajaxer <calid...@gmail.com> wrote:
>
> > > > > > > > > > > also 1000 index limit makes it not possible to fetcher 
> > > > > > > > > > > older data on
> > > > > > > > > > > paging.
>
> > > > > > > > > > > for if we need an indexed page more than 10000 items,
> > > > > > > > > > > it would cost us a lot of cpu time to calculate the base 
> > > > > > > > > > > for GQL
> > > > > > > > > > > to fetch the data with index less than 1000.
>
> > > > > > > > > >http://code.google.com/appengine/articles/paging.html-Hidequotedtext-
>
> > > > > > > > > - Show quoted text -- Hide quoted text -
>
> > > > > > > - Show quoted text -- Hide quoted text -
>
> > > > > - Show quoted text -- Hide quoted text -
>
> > > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.


Reply via email to