Short Misses:

 

Let's say you are building an app that Pulls all of the Tax, Owner, and Last
sale Prices for a given Address for a Real Estate App. You have decided to
Optimize it with a Bloom Filter.

 

Step 1: Query Validation

Doing data structure validation makes a nice poor man's bloom.  In my
ongoing battle with the Google Bot which is perfectly happy to read a form,
and then Stuff gibberish in to it to see what comes out, we found we could
drastically reduce datastore calls by validating that a query was asking for
data in a format that made sense before actually processing it.

 

This is not a Bloom filter, but  knowing that a phone number is only
numeric, or that First and Last names don't have numbers in them, or that
addresses don't have symbols can save you a lot of calls made by bots. Or
humans that type poorly.  

 

Step 2: Compact Aproximators

 

You can get much better Cache Hits if you format the valid queries the same
way ever time. Consider forcing case of the query, stripping characters and
white space, and adding an ignore list for things which may not be needed.

 

1313 MockingBird Lane

1313 MockingBird LN

1313 MockingBird ln

1313 Mockingbird Lane

 

Are all the same 

 

When I mentioned "short Misses" I was looking for "Compact Approximators"

 

Converting 1313 Mockingbird Lane to a Compact Approximator  Helps you avoid
look ups for things you don't have.

 

A sample implementation might be to only look at Non-Vowels, and Sans Street
Type Designation

 

So you would look at 1313mckngbrd 

 

For the purpose of your Bloom Look up you would not include the City and
state. 

 

This may not shrink your data set enough to make it fit in the Bloom, or you
may want a tiered Bloom. Based on the number of unique entries.   "Have we
got any data on Addresses with number 1313" and a Street Name bloom with
Only check the actual Data Store if you have both a parcel with the number
requested and a street requested.  This is of course a bad example since in
all of the US likely every number up to 100,000 is likely used, but I didn't
want to change my example so pretend that for whatever reason it does work.

 

Step 3: Keeping the bloom alive

Blooms are for speeding things up. So they don't have to be 100% up to date
but you'd rather they say, "We might have this, look it up" than the
declarative "we don't have this, don't bother".  Because of this, you want
to update your bloom on the creation of new entries, and the change of
searchable entries but you don't need to update on deletion.

 

Because of this you will likely want to have a transaction log of recent
rights, and update the bloom on a timer.  When checking the bloom you would
search the transaction log, then the bloom, then the datastore.  This will
prevent False "we don't have this" on new data. While still offering the
ability to keep up with new data being added.

 

 

 

 

.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to