[google-appengine] Re: Is one big table table quicker than 8 smaller tables?

lock Sat, 06 Dec 2008 06:40:21 -0800


On Dec 6, 9:46 pm, Nick Johnson <[EMAIL PROTECTED]> wrote:
> On Dec 6, 1:47 am, lock <[EMAIL PROTECTED]> wrote:
>
> > Thanks guys, that's what I was hoping to hear, you saved me a couple
> > hours trying to prove it for myself (not to mention the frustration).
> > After I went away and thought about it some more I figured there must
> > be some 'smarts' in the database to prevent the query time from
> > increasing.  Otherwise how could any database scale well...
>
> > No merge joins or IN operators in my code, so nothing to worry about
> > there.
>
> > After a _lot_ more testing I'm finding that query time does scale with
> > the number of fetched _results_, not the DB size.  During early
> > testing I convinced myself that increasing the DB size was slowing my
> > query down, when really the number of results were increasing as I
> > added more data, doh (it was getting late ;-)  ).
>
> One thing to bear in mind is that the dev_appserver performance is
> _not_ representative of the production performance. The dev_appserver
> holds the entire dataset in memory and does linear scans over an
> entity type for queries, so performance there _will_ degrade with
> respect to the size of an entity type.


Oh really!  That may have also contributed to my initial theory about
DB
performance being adversely affected by its size.  Thanks for the tip,
definitely
something to keep in mind.
Hopefully in future versions of the SDK, the dev server will start to
better mimic
the behavior of the actual app engine framework.  It would be really
great if
for example it gave similar CPU usage warnings.

>
>
> > The overall solution that seems to be working well for me at the
> > moment is to have different tables for different resolutions.  As the
> > size of the geometric bounds increases I switch between a few tables,
> > each one with a lower fidelity therefore reducing the number of
> > results that can be returned.  Visually it works similar to Level Of
> > Detail techniques you see in some 3D modeling packages.
>
> I'm curious how you're doing this with only a limited number of
> queries. Geohashing isn't ideally suited to satisfying bounding box
> queries (though it's certainly better than storing plain lat/longs).

Please tell me if I'm wrong but isn't geohashing the only way you can
do a
bounding box type query with a datastore query?  I must admit during
early
development I just assumed I was going to be able to do a query
something
like:
'SELECT * WHERE lat < top AND lat > bottom AND long > right AND long <
left ...'
Got a bit of a shock when I found I could only query based on one
field.

The only other way I though of doing it was to query based on
longitude,
then just filter the results by lat in a loop after.  Knowing what I
do now
(queries that return a lot of results chew up CPU cycles), I'd say
this would
be the wrong approach.

As for the level of detail stuff, it's nothing too sophisticated, I'll
try to
elaborate.  It's unrelated to geohash.

My app has 4 tables, 1 contains all data points, the other 3 are of
varying
resolutions (LOD tables).

When adding a point (lat/long) it gets put into the table containing
all data
points.  Next I start adding this same point to the appropriate LOD
tables, for
the 'high res' one I round the lat/long to 2 decimal places and
compute the
geohash.  If the geohash is present in the table then this point has
been fully
added, otherwise it is added to the 'high res' table and we continue.
The same
lat/long is then rounded to 1 decimal place, the geohash of this is
then calculated
and checked if it is in the 'medium res' LOD table, if present then
just return.
If not then do something similar again for the 'High res' LOD table.

Points are obtained from the app by a bounding box, the lat/long of
the NE and SW
corners.  From this we can calculate a rough size unit for the
bounding box.  At
the moment I'm using the diagonal length in degrees squared.  From
this number we
determine which table to query.  For large bounding boxes the 'low
res' LOD table
is used, for small boxes the 'high res' LOD table is used.  For even
smaller
bounding boxes I just get the results out of the table containing all
data points.

Hope that made sense.

Anyway, if you want to see it in action checkout
'bikebingle.appspot.com'.  Please
enter as much random data as you want, all the stuff in there at the
moment is just
test data and will be removed soonish.  If you find any bugs while
your there, I'd
love to know about them :-), hopefully BikeBingle will be 'going live'
in the next
couple days.  BTW, I wouldn't click the 'Make random' button (for
debugging purposes),
it fires off a 100 post requests....  Of course by saying that I know
someones
going to click it ;-)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Is one big table table quicker than 8 smaller tables?

Reply via email to