Re: [appengine-java] Objectify - Twig - approaches to persistence

John Patterson Sat, 13 Mar 2010 00:40:44 -0800

On 13 Mar 2010, at 13:14, Jeff Schnitzer wrote:


Some queries will certainly return faster than others, and from what
I've read/watched, keys-only queries should have performance profiles
roughly similar to simple gets.  But there can be no doubt that real
queries are quite slow compared to simple gets.

My point is that you cannot just replace a query with gets. If yourapp needs gets then use gets. If your app needs queries then doqueries. You really don't have the choice of one or the other justbecause one is faster.

But you're arguing with a straw man here.  I've never suggested that
queries are not useful.

However, you *have* suggested that batch gets aren't important.
"Batch gets are really only useful in apps that need to take a load of
ids from an external source and do something with them."  That's
absolute rubbish.  A very large (and growing) number of applications
are being built on NoSQL databases that are effectively key-value
stores.

I did not say that batch gets are not important. I was responding toyour claim that they were more important than queries which may betrue in your app but not many others - certainly not mine. You needto recognise that different app have different requirements and manyapps require queries.

 Cassandra, Tokyo Cabinet, HBase, Voldemort, and *dozens* of
other tools are being developed because they can do something that
relational systems can't:  get() and put() vast quantities of data
quickly.

Brilliant. Good for them. I really don't see what that has to dowith making queries easier or faster.

There are a growing number of applications (largely defined by
staggeringly large user bases) in which the cost of maintaining
traditional indexes is not practical.  You aren't going to implement
Twitter or Facebook with a bunch of appengine queries!  But apparently
Cassandra works great.

I am sure FaceBook has a few queries in there somewhere :) I wouldcertainly not say that any one type of datastore operation is all thatis required for every app. Our different focuses show that clearly.A good GAE framework should support hard queries (OR, AND on multipleproperties) and *also* batch gets.

I have to ask you something though - would you need to do 9 parallel
queries if you were working with a datastore that has proper spatial
indexes?  Not that doing parallel queries isn't cool, but is it
actually necessary for your app?

If it wasn't for the spatial indexes I would still need ORs - in facton average I query 4-6 geo "blocks" but each may require a few ORs sopotentially 18 or more queries for one search. Luckily that is not acommon case.

I'm not doing spatial queries right now, but it's on the horizon.
I've done the research.  For my application, it's much easier and more
efficient to push my spatial queries off to a cluster of PostGIS
instances running elsewhere in the cloud.  It's also much, much
cheaper.

I might need to look into this as traffic grows. Incredibly, the keys-only part of these queries now returns in about 50-70ms although theCPU cost is still high.

This "partial update" approach only works in cases where you arenot addinga field that you will query on. That needs to be an all-or-nothingbatch
job.
Nonsense, this is totally dependent on the specific logic of yourapplication.
Simple example:  You're adding a loginCount to your User entity, and
you want to add a query that selects out users that have logged in
more than N times.  No reason you can't start running those queries
right away.

Not if you have not indexed the login count. Or it needs to beextracted from some other data.

You're trying to dismiss the utility of upgrading the dataset in-place
by saying that *some* application features require the dataset to be
completely transitioned before being enabled.  Ok, some do some don't.
Your claim is still absurd.

No, I have already said that I can see a benefit in certain cases -such as your FaceBook example - of in-place changes. But is not thebest solution for every type of schema update.

It probably explains why you don't think that OR queries are soimportant.
The reason OR queries aren't high on our priority list is because
nobody has been asking for them.

Well don't implement them if you don't need them. Get to work on thatwebsite instead! :)

They were one of the first things I tried on App Engine and one ofthe
reasons Twig was written.  I would bet that most developers could not
imagine working with an RDBMS that did not support OR and ANDqueries (onmore than one property). Twigs support for these saves time andreduces thecomplexity of the developers app. With Objectify they are left ontheir own
to re-invent the wheel every time.
Our conceptual model of the datastore is not an RDBMS.  It's a
key-value store that also allows limited queryability.  If you really
want an RDBMS, I'm sure the Cloud2db guys will be happy to chime in
again.

I definitely do not want an RDBMS. I want (and have) an Objectpersistence interface which makes querying as easy as possible.


John

--
You received this message because you are subscribed to the Google Groups "Google 
App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Re: [appengine-java] Objectify - Twig - approaches to persistence

Reply via email to