Re: [appengine-java] Re: Nearly doubled CPU usage since december 3rd

2009-12-08 Thread Eric Rannaud
On Tue, Dec 8, 2009 at 2:41 AM, SCMSoft  wrote:
> This seems very weird, as the pricing is all dependent on CPU time,
> and is similarly prices as for instance Amazon EC2. Now when the CPU
> seconds are not actually CPU seconds but some bogus unit (that happens
> to increase much faster than actual CPU usage), this suddenly makes
> appengine a lot less attractive.

You can read up on this subject at:
http://code.google.com/appengine/docs/quotas.html

"CPU time is reported in "seconds," which is equivalent to the number
of CPU cycles that can be performed by a 1.2 GHz Intel x86 processor
in that amount of time. The actual number of CPU cycles spent varies
greatly depending on conditions internal to App Engine, so this number
is adjusted for reporting purposes using this processor as a reference
measurement.
One tool to assist you in identifying areas in the application which
use high amounts of runtime CPU quota is the cProfile module. For
instructions on setting up profiling while debugging your application,
see "How do I profile my app's performance?".
You can examine the CPU time used to serve each request by looking at
the Logs section of the Admin Console. While profiling will assist in
identifying inefficient portions of your Python code, it's also
helpful to understand which datastore operations contribute to your
CPU usage.
Writes to the datastore use roughly 5 times as much CPU as reads.
Writes that update indexes require more CPU than writes that do not.
As the number of properties associated with a given entity increases,
so does the CPU time required to read and write that entity.
For the most part, queries are equally efficient, since all queries
use indexes. However, fetching results requires additional CPU Time."

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.




Re: [appengine-java] Re: Datastore Statistics vs. Quota

2009-12-08 Thread Eric Rannaud
On Tue, Dec 8, 2009 at 1:26 AM, Toby  wrote:
> Reading the doc carefully it tells me that indexes are created based
> on the queries I make. Hence if I never query an entity on a certain
> property there should be no index.

That's not exactly what it says. Only the development servers
"creates" indexes automatically by inserting a few lines in your
datastore-indexes-auto.xml. (It doesn't actually create indexes as the
development server doesn't need indexes to access data, I believe.)
The point of that mechanism is to make it easy for you to know which
indexes to request on the production appengine: just look in the
datastore-indexes-auto.xml, and copy the lines that you want from it.
If so configured, appcfg.sh will copy the entire content of
datastore-indexes-auto.xml to the production server.


> Also in the app-engine administration there is a menu entry "Indexes"
> which only lists two indexes that I am actually using.
>
> Now if I understand right per default many more indexes are created if
> I need them or not.  This should be mentioned more clearly because it
> could help to save a lot of space.

That's it. Most of the indexes space is likely to be coming from
implicit indexes.


> Is there an JDO annotation to disable indexes on properties or do I
> need to disable auto-index?

There is:

  @Persistent
  @Extension(vendorName = "datanucleus", key = "gae.unindexed", value="true")
  private String unindexedString;

It's in the doc:
http://code.google.com/intl/en/appengine/docs/java/datastore/queriesandindexes.html

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.




Re: [appengine-java] Nearly doubled CPU usage since december 3rd

2009-12-07 Thread Eric Rannaud
On Mon, Dec 7, 2009 at 5:42 AM, SCMSoft  wrote:
> We always used to have ~100ms api cpu_ms, but cpu_ms used to be more
> like 50ms or so. We added logging on the time of entry and exit of
> doGet() and there was 58 ms difference in this case. How is it
> possible to have 210cpu_ms, which is ~100ms more than the api usage,
> while the request only lasted 58 ms (or 76 ms according to the
> appengine number)?

I cannot tell you why your CPU time seems to have increased, but know
that cpu_ms and api_cpu_ms are not using wall-clock milliseconds, but
"virtual" milliseconds, defined for the purpose of billing. They more
or less correspond to milliseconds on an old-ish machine (some kind of
Intel 1.2GHz, IIRC) -- and they depend on the type of request. They
are essentially an abstract unit. The app engine documentation gives
more detail:

http://code.google.com/appengine/docs/quotas.html

Google might have been better inspired to not label this unit
"millisecond". "Disney dollars" are meant to confuse you -- that might
not have been a good example to follow.


P.S. Turns out, Disney dollars are convertible and pegged to the USD.
Linden dollars, then.

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.




Re: [appengine-java] Re: Datastore Statistics vs. Quota

2009-12-07 Thread Eric Rannaud
On Mon, Dec 7, 2009 at 2:20 AM, Toby  wrote:
> thank you for your update. In fact I was suspecting the index or other
> management data. But it is hard to believe that it leads to such a big
> overhead. I mean it is enormous to have an index that is 10 times more
> than the actual data, don't you think so?

If you look at the article, it doesn't seem that out of place.
Remember that by default, two indexes are built for every property,
EntitiesByProperty ASC and EntitiesByProperty DESC. If you look at the
number of fields in the corresponding tables (see article), and if
your entity has 5-10 fields, I would not be surprised by such an
overhead.

Marking properties as non-indexable, if you don't need the systematic
indexing that Google does, will help save a lot of space.

> Furthermore in the datastore statistics they already list so called
> "Metadata". It is consuming about 44% of the space. I think that this
> is the index, is it not?

I don't think it is, no. I believe it refers to other kinds of
metadata (see article linked earlier in the thread). Index disk space
usage is, I believe, nowhere explicit.

Eric.

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.




Re: [appengine-java] Re: Datastore Statistics vs. Quota

2009-12-05 Thread Eric Rannaud
On Sat, Dec 5, 2009 at 10:05 AM, Eric Rannaud  wrote:
> They also talked about an article they will publish soon that gives
> enough details on how indexes are built that you can at least predict
> the size of your indexes.

There it is:
http://code.google.com/appengine/articles/storage_breakdown.html

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.




Re: [appengine-java] Re: Datastore Statistics vs. Quota

2009-12-05 Thread Eric Rannaud
On Sat, Dec 5, 2009 at 9:29 AM, Toby  wrote:
> I know that they are updated at least once a day. In my case the data
> volume has not changed for quite some time. And the discrepancy is
> really huge. I mean 50MB to 500MB. So maybe my datastore contains
> stuff that does not apear in the statistics. Or the statistics are
> wrong. I wonder if other people find the same ratio.  Is there any
> other way to get details on the data consumption of the datastore. Do
> e.g. log-files count into it or other data?

The difference is likely coming from the indexes, both implicit
indexes that Google builds systematically, and indexes that you have
requested explicitly.

When asked about the lack of visibility into indexes disk usage,
during the IRC office hours, the Google people said they were working
on improving that, and the admin console will likely give you more
details about where that extra space is going. It will be very useful,
as you can mark properties to be non-indexable, and thus save space.

They also talked about an article they will publish soon that gives
enough details on how indexes are built that you can at least predict
the size of your indexes.

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.




[appengine-java] Re: [google-appengine] Re: Datastore is slow on queries involving many entities, but a smallish dataset

2009-12-02 Thread Eric Rannaud
Crossposting to App Engine Java group: the original thread is at
http://groups.google.com/group/google-appengine/browse_thread/thread/22018ef2e132ac13/54485b787d5a80b5

In a few words: I have a problem with reasonable queries taking a very
long time (several seconds). These queries return 128 entities, from a
total of 500,000 entities of that type in the datastore. Each entity
is about 400 bytes.

On Tue, Dec 1, 2009 at 6:49 PM, Stephen  wrote:
> On Dec 1, 9:12 pm, Eric Rannaud  wrote:
>> On Tue, Dec 1, 2009 at 11:02 AM, Stephen  wrote:
>>> On Dec 1, 9:55 am, Eric Rannaud  wrote:
>>>> SELECT * FROM MessageS where id >= 0 && id < 128 order by id
>>>>
>>>>     Calendar c = Calendar.getInstance();
>>>>     long t0 = c.getTimeInMillis();
>>>>     qmsgr = (List) qmsg.execute(lo, hi);
>>>>     System.err.println("getCMIdRange:qmsg: " + (c.getTimeInMillis() - t0));
>>
>>> Are you fetching all 128 entities in one batch? If you don't, the
>>> result is fetched in batches of 20, incurring extra disk reads and rpc
>>> overhead.
>>
>>> Not sure how you do that with the Java API, but with python you pass
>>> '128' to the .fetch() method of a query object.
>>
>> As far as I can tell, there is no such equivalent in the Java API. The
>
> Something like this..?
>
> DatastoreService datastore =
>    DatastoreServiceFactory.getDatastoreService();
>
> Query query = new Query("MessageS");
> query.addFilter("id", Query.FilterOperator.GREATER_THAN_OR_EQUAL, 0);
>
> List messages = datastore.prepare(query)
>    .asList(FetchOptions.Builder.withLimit(128));
>
> You might also have to tweak chunkSize and/or prefetchSize, or ask on
> the Java list.

I did some tests with the code you proposed. The performance remains
essentially the same as with the JDO API, i.e. between 1 and 4 second
per "execute"/"prepare" statement (2.5s on average).

DatastoreService datastore =
DatastoreServiceFactory.getDatastoreService();
Query query = new Query("MessageS");
query.addFilter("id", Query.FilterOperator.GREATER_THAN_OR_EQUAL, lo);
query.addFilter("id", Query.FilterOperator.LESS_THAN, hi);

long t0 = Calendar.getInstance().getTimeInMillis();

List r = datastore.prepare(query)
.asList(FetchOptions.Builder
.withLimit(128)
.prefetchSize(128)
.chunkSize(128));

System.err.println("LOW:getCMIdRange:qmsg: "
   + (Calendar.getInstance().getTimeInMillis() - t0)
   + " " + r.size());

Thanks.

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.