On Dec 6, 9:46 pm, Nick Johnson <[EMAIL PROTECTED]> wrote: > On Dec 6, 1:47 am, lock <[EMAIL PROTECTED]> wrote: > > > Thanks guys, that's what I was hoping to hear, you saved me a couple > > hours trying to prove it for myself (not to mention the frustration). > > After I went away and thought about it some more I figured there must > > be some 'smarts' in the database to prevent the query time from > > increasing. Otherwise how could any database scale well... > > > No merge joins or IN operators in my code, so nothing to worry about > > there. > > > After a _lot_ more testing I'm finding that query time does scale with > > the number of fetched _results_, not the DB size. During early > > testing I convinced myself that increasing the DB size was slowing my > > query down, when really the number of results were increasing as I > > added more data, doh (it was getting late ;-) ). > > One thing to bear in mind is that the dev_appserver performance is > _not_ representative of the production performance. The dev_appserver > holds the entire dataset in memory and does linear scans over an > entity type for queries, so performance there _will_ degrade with > respect to the size of an entity type.
Oh really! That may have also contributed to my initial theory about DB performance being adversely affected by its size. Thanks for the tip, definitely something to keep in mind. Hopefully in future versions of the SDK, the dev server will start to better mimic the behavior of the actual app engine framework. It would be really great if for example it gave similar CPU usage warnings. > > > > The overall solution that seems to be working well for me at the > > moment is to have different tables for different resolutions. As the > > size of the geometric bounds increases I switch between a few tables, > > each one with a lower fidelity therefore reducing the number of > > results that can be returned. Visually it works similar to Level Of > > Detail techniques you see in some 3D modeling packages. > > I'm curious how you're doing this with only a limited number of > queries. Geohashing isn't ideally suited to satisfying bounding box > queries (though it's certainly better than storing plain lat/longs). Please tell me if I'm wrong but isn't geohashing the only way you can do a bounding box type query with a datastore query? I must admit during early development I just assumed I was going to be able to do a query something like: 'SELECT * WHERE lat < top AND lat > bottom AND long > right AND long < left ...' Got a bit of a shock when I found I could only query based on one field. The only other way I though of doing it was to query based on longitude, then just filter the results by lat in a loop after. Knowing what I do now (queries that return a lot of results chew up CPU cycles), I'd say this would be the wrong approach. As for the level of detail stuff, it's nothing too sophisticated, I'll try to elaborate. It's unrelated to geohash. My app has 4 tables, 1 contains all data points, the other 3 are of varying resolutions (LOD tables). When adding a point (lat/long) it gets put into the table containing all data points. Next I start adding this same point to the appropriate LOD tables, for the 'high res' one I round the lat/long to 2 decimal places and compute the geohash. If the geohash is present in the table then this point has been fully added, otherwise it is added to the 'high res' table and we continue. The same lat/long is then rounded to 1 decimal place, the geohash of this is then calculated and checked if it is in the 'medium res' LOD table, if present then just return. If not then do something similar again for the 'High res' LOD table. Points are obtained from the app by a bounding box, the lat/long of the NE and SW corners. From this we can calculate a rough size unit for the bounding box. At the moment I'm using the diagonal length in degrees squared. From this number we determine which table to query. For large bounding boxes the 'low res' LOD table is used, for small boxes the 'high res' LOD table is used. For even smaller bounding boxes I just get the results out of the table containing all data points. Hope that made sense. Anyway, if you want to see it in action checkout 'bikebingle.appspot.com'. Please enter as much random data as you want, all the stuff in there at the moment is just test data and will be removed soonish. If you find any bugs while your there, I'd love to know about them :-), hopefully BikeBingle will be 'going live' in the next couple days. BTW, I wouldn't click the 'Make random' button (for debugging purposes), it fires off a 100 post requests.... Of course by saying that I know someones going to click it ;-) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---