On 12 Mar 2010, at 16:28, Jeff Schnitzer wrote:

Look at these graphs:

http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-get-latency
http://code.google.com/status/appengine/detail/datastore/2010/03/12#ae-trust-detail-datastore-query-latency

Notice that a get()'s average latency is 50ms and a query()'s average
latency is 500ms.  Last week the typical query was averaging
800-1000ms with frequent spikes into 1200ms or so.

"You are increasing my suspicion that you have never worked" with an application that queries large amounts of data. If your queries are taking anywhere near 1000 ms then you must be doing something seriously wrong.

One of my apps query times are generally in the 200 ms range over 2 million records. A keys-only query can return in 50ms.

This is the time required to execute 9 parallel queries on geospatial data and OR merge them together. Keep in mind that with Twig I could execute 90 parallel queries and expect the time to be about the same.


Deep down in the fiber of its being, BigTable is a key-value store.
It is very very efficient at doing batch gets.  It wants to do batch
gets all day long.  Queries require touching indexes maintained in
alternative tablets and comparatively, the performance sucks.

You are ignoring the fact that for many (most?) applications queries are essential. I completely understand that your FaceBook app doesn't depend on them but assuming that other peoples apps also do not is just not helpful.

Why am I obsessed with batch gets?  Because they're essential for
making an application perform.  They're why there is such a thing as a
NoSQL movement in the first place.

Again, essential for your app. Not mine and probably many other apps in which querying their own data is more important. Batch gets are really only useful in apps that need to take a load of ids from an external source and do something with them. Social network "extension" apps for example.

Just to reiterate - batch gets of external ids is a trivial feature that has always been planned to be a part of the new "load command" that will follow the pattern of the find and store commands.

* Fire off a batch job at your leisure to finish it off.

This "partial update" approach only works in cases where you are not adding a field that you will query on. That needs to be an all-or- nothing batch job.

What is with your obsession with batch gets? I understand they are central in Objectify because you are always loading keys. As I said already - even though this is not as essential in Twig it will be added to a new load
command.

Batch gets are *the* core feature of NoSQL databases, including the
GAE datastore.

Querying is important. You are ignoring a whole class of applications if you think that querying is not important. I understand that your applications works with FaceBook and does a lot of "lookups" by external ids in a large dataset so to your mind batch get is the most important operation. This is really not such a common scenario as you social network developers might think.

One of the applications I work on application has about 2 million records on which it needs to do geospatial queries sorted and filtered. I guarantee you that there are many other applications that have different query needs so to focus only on batch gets is myopic.

It probably explains why you don't think that OR queries are so important. They were one of the first things I tried on App Engine and one of the reasons Twig was written. I would bet that most developers could not imagine working with an RDBMS that did not support OR and AND queries (on more than one property). Twigs support for these saves time and reduces the complexity of the developers app. With Objectify they are left on their own to re-invent the wheel every time.

The high-level design of Twigs commands means that ORs are supported now in the query API. Objectifies low-level design could only help out by providing helper classes - hardly user friendly or intuitive. The goal of Twigs design is to put these common solutions at the developers finger tips. Yes there are more methods in the API, but they are well organised using the fluent style commands.

The command pattern used by Twig has the potential to add new "high level" functionality that Objectifies low-level query interface would need to rely on helper functions. For example, supporting AND queries with more than one inequality filter is in development. Just like the OR queries it will "stream" results, never keeping more than a small number in memory.

These are the types of common problems that take a lot of time to code. Im not saying that this is impossible to code with Objectify - just that it is up to the developer to code these patterns again and again. Re-inventing the wheel is one of the biggest wasters of a developers time. That and long discussions on mailing lists :)

I do appreciate Objectifies simplicity - but it is built on a system that is already too simple to be very usable for a lot of apps. In my mind the goal of a good GAE framework should be to make the difficult easy - not just to make the easy typesafe and pretty. I think that really sums up the different goals of Twig and Objectify.


John

--
You received this message because you are subscribed to the Google Groups "Google 
App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to