Short Misses:
Let's say you are building an app that Pulls all of the Tax, Owner, and Last sale Prices for a given Address for a Real Estate App. You have decided to Optimize it with a Bloom Filter. Step 1: Query Validation Doing data structure validation makes a nice poor man's bloom. In my ongoing battle with the Google Bot which is perfectly happy to read a form, and then Stuff gibberish in to it to see what comes out, we found we could drastically reduce datastore calls by validating that a query was asking for data in a format that made sense before actually processing it. This is not a Bloom filter, but knowing that a phone number is only numeric, or that First and Last names don't have numbers in them, or that addresses don't have symbols can save you a lot of calls made by bots. Or humans that type poorly. Step 2: Compact Aproximators You can get much better Cache Hits if you format the valid queries the same way ever time. Consider forcing case of the query, stripping characters and white space, and adding an ignore list for things which may not be needed. 1313 MockingBird Lane 1313 MockingBird LN 1313 MockingBird ln 1313 Mockingbird Lane Are all the same When I mentioned "short Misses" I was looking for "Compact Approximators" Converting 1313 Mockingbird Lane to a Compact Approximator Helps you avoid look ups for things you don't have. A sample implementation might be to only look at Non-Vowels, and Sans Street Type Designation So you would look at 1313mckngbrd For the purpose of your Bloom Look up you would not include the City and state. This may not shrink your data set enough to make it fit in the Bloom, or you may want a tiered Bloom. Based on the number of unique entries. "Have we got any data on Addresses with number 1313" and a Street Name bloom with Only check the actual Data Store if you have both a parcel with the number requested and a street requested. This is of course a bad example since in all of the US likely every number up to 100,000 is likely used, but I didn't want to change my example so pretend that for whatever reason it does work. Step 3: Keeping the bloom alive Blooms are for speeding things up. So they don't have to be 100% up to date but you'd rather they say, "We might have this, look it up" than the declarative "we don't have this, don't bother". Because of this, you want to update your bloom on the creation of new entries, and the change of searchable entries but you don't need to update on deletion. Because of this you will likely want to have a transaction log of recent rights, and update the bloom on a timer. When checking the bloom you would search the transaction log, then the bloom, then the datastore. This will prevent False "we don't have this" on new data. While still offering the ability to keep up with new data being added. . -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.