On 13 Mar 2010, at 13:14, Jeff Schnitzer wrote:

Some queries will certainly return faster than others, and from what
I've read/watched, keys-only queries should have performance profiles
roughly similar to simple gets.  But there can be no doubt that real
queries are quite slow compared to simple gets.

My point is that you cannot just replace a query with gets. If your app needs gets then use gets. If your app needs queries then do queries. You really don't have the choice of one or the other just because one is faster.

But you're arguing with a straw man here.  I've never suggested that
queries are not useful.

However, you *have* suggested that batch gets aren't important.
"Batch gets are really only useful in apps that need to take a load of
ids from an external source and do something with them."  That's
absolute rubbish.  A very large (and growing) number of applications
are being built on NoSQL databases that are effectively key-value
stores.

I did not say that batch gets are not important. I was responding to your claim that they were more important than queries which may be true in your app but not many others - certainly not mine. You need to recognise that different app have different requirements and many apps require queries.

 Cassandra, Tokyo Cabinet, HBase, Voldemort, and *dozens* of
other tools are being developed because they can do something that
relational systems can't:  get() and put() vast quantities of data
quickly.

Brilliant. Good for them. I really don't see what that has to do with making queries easier or faster.

There are a growing number of applications (largely defined by
staggeringly large user bases) in which the cost of maintaining
traditional indexes is not practical.  You aren't going to implement
Twitter or Facebook with a bunch of appengine queries!  But apparently
Cassandra works great.

I am sure FaceBook has a few queries in there somewhere :) I would certainly not say that any one type of datastore operation is all that is required for every app. Our different focuses show that clearly. A good GAE framework should support hard queries (OR, AND on multiple properties) and *also* batch gets.

I have to ask you something though - would you need to do 9 parallel
queries if you were working with a datastore that has proper spatial
indexes?  Not that doing parallel queries isn't cool, but is it
actually necessary for your app?

If it wasn't for the spatial indexes I would still need ORs - in fact on average I query 4-6 geo "blocks" but each may require a few ORs so potentially 18 or more queries for one search. Luckily that is not a common case.

I'm not doing spatial queries right now, but it's on the horizon.
I've done the research.  For my application, it's much easier and more
efficient to push my spatial queries off to a cluster of PostGIS
instances running elsewhere in the cloud.  It's also much, much
cheaper.

I might need to look into this as traffic grows. Incredibly, the keys- only part of these queries now returns in about 50-70ms although the CPU cost is still high.

This "partial update" approach only works in cases where you are not adding a field that you will query on. That needs to be an all-or-nothing batch
job.

Nonsense, this is totally dependent on the specific logic of your application.

Simple example:  You're adding a loginCount to your User entity, and
you want to add a query that selects out users that have logged in
more than N times.  No reason you can't start running those queries
right away.

Not if you have not indexed the login count. Or it needs to be extracted from some other data.


You're trying to dismiss the utility of upgrading the dataset in-place
by saying that *some* application features require the dataset to be
completely transitioned before being enabled.  Ok, some do some don't.
Your claim is still absurd.

No, I have already said that I can see a benefit in certain cases - such as your FaceBook example - of in-place changes. But is not the best solution for every type of schema update.

It probably explains why you don't think that OR queries are so important.

The reason OR queries aren't high on our priority list is because
nobody has been asking for them.

Well don't implement them if you don't need them. Get to work on that website instead! :)

They were one of the first things I tried on App Engine and one of the
reasons Twig was written.  I would bet that most developers could not
imagine working with an RDBMS that did not support OR and AND queries (on more than one property). Twigs support for these saves time and reduces the complexity of the developers app. With Objectify they are left on their own
to re-invent the wheel every time.

Our conceptual model of the datastore is not an RDBMS.  It's a
key-value store that also allows limited queryability.  If you really
want an RDBMS, I'm sure the Cloud2db guys will be happy to chime in
again.

I definitely do not want an RDBMS. I want (and have) an Object persistence interface which makes querying as easy as possible.

John

--
You received this message because you are subscribed to the Google Groups "Google 
App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to