Here's another way I call a "Keybox search."

-John Coryat


Performing a Spatial Search without a Spatially Enabled Database

Author: John D. Coryat, USNaviguide (http://www.usnaviguide.com and 
http://maps.huge.info )
September 22, 2007

Most mapping applications that involve querying a database have a 
requirement to filter records by either distance to some object or being 
contained in a polygon. These queries can be cumbersome and slow, 
especially if using a non-spatially enabled database such as MySQL or MS 
SQL Server. Even using a database capable of doing these functions 
internally such as PostgreSQL, PostGIS or Oracle can be quite slow, 
especially if there are huge number of records involved.

A method that reduces the time and complexity involved with filtering out 
records is to perform a search of the database utilizing a fast and 
efficient native function to all database systems, the integer key (an 
integer column in the database with an index attached). The trick to doing 
this is to convert a coordinate into an integer, and then select out the 
values that would include all possible candidates while not rejecting any 
valid ones.

I developed an algorithm that performs this integer key search which is 
fast, efficient, flexible and easy to code. It's particularly well suited 
for large databases and will work in any language, and virtually any 
database. I call this method the "Keybox Search."

A keybox search computes likely candidates from this key and the "boxes" 
surrounding it. The "boxes" can be three, five or seven boxes on each side. 
It can be larger, but instead of increasing the number beyond five squares, 
it is better to change the key to be less precise (see example). The reason 
for "odd" numbers of "boxes" is that there should be the same number of 
"boxes" on each side of the coordinate, so figure one for the value of the 
key (middle), then two to the left and two to the right adds up to: 2 + 1 + 
2 = 5 or 25 (five vertical and five horizontal) boxes, for one on each 
side, that works out to 1 + 1 + 1 = 3 or 9 (3x3=9) keys. If this seems 
confusing, take a piece of graph paper and mark a spot in the center, then 
mark every box around it, the total of your marked boxes is nine, that's a 
three square box search. Lets include a diagram here of that. I also think 
this should be moved down, see my final comment. -Pamela 9/23/07 12:43 PM


I use this algorithm in a number of applications, from searching a database 
of over 100,000,000 records in the reverse geocoder to locating a point in 
a polygon with a database of 100,000 polygons. I use PostgreSQL and PostGIS 
databases, which both have spatial search functions built in, but limiting 
the number of records that actually hit the built in functions speeds the 
search up dramatically.

The keybox search method takes a point and transforms it into an integer in 
this manner:

Example: (5 mile bounding box for a polygon, 1 degree ~ 50 miles or 1/10th 
a degree)

35.12345,-90.54321

1/10th degree accuracy: 35.1,-90.5

Translate into an integer:

absolute value(round to zero decimals(35.12345 x 10)) x 10000 = 3510000

absolute value(round to zero decimals(-90.54321 x 10)) = 905

3510000 + 905 = 3510905 : this is the integer key.


To construct a list of keybox keys to search for, a loop within a loop is 
constructed that iterates through the possible values for each latitude and 
longitude, rounded for precision to meet the requirements of the 
appropriate box. See the code example at the end of this article for 
details. In the example above, using a keybox search based on a 3x3 box, 
the keys to include would be:

3500904 3500905 3500906
3510904 3510905 3510906
3520904 3520905 3520906

Note the key in the middle is the same as the starting key. The reason the 
surrounding keys are required is due to the position of the point may be 
near the extremes of the rounding. To catch all possible candidates, the 
box search is required. This will include any keybox that is within about 
5-7 miles of the original point. While this still may include a large 
number of candidates, it will certainly exclude a huge number, all without 
ever doing any spatial calculations. This method is extremely fast, as 
integer keys are the fastest type of index search in a database, regardless 
of database type.

To calculate a keybox key for a polygon, take the bounding box and compute 
a centroid. This is a good likely point for a polygon. In SQL terms, this 
calculation will create keybox keys based on the bounding box (PostgreSQL 
syntax):

update mygeometrytable set keybox = (round(abs((center(box(poly)))[1])*10) 
* 10000) + round(abs((center(box(poly)))[0])*10) ;


The keybox search method is not generic. It has to be modified for each 
intended use by determining the maximum size of polygon or the distance 
from a point that will be searched. This can be adjusted by altering the 
precision of the key and by the number of keybox's being searched. A 5x5 
grid of 1/10th degree precision will yield about a 12 mile bounding box, 
depending on latitude. It is better to err on the side of large then small, 
as the results are still very fast even if too many candidates get through 
the integer screening. By erring on the large side, the calculation will 
never fail.


International note: This code is optimized for the US, as the latitudes are 
always positive and the longitudes are always negative. For other 
situations, the keybox method can easily be modified to include something 
in the key to indicate the difference, like adding a digit to the front of 
the key. As long as the database doesn't contain any records that violate 
local conditions, no changes are required.

Here is a perl code to construct a query (3x3 box).


#!/usr/bin/perl -w
# Sample Perl code to construct a 3x3 keybox...
# Author: John D. Coryat, USNaviguide 09/2007...

use strict ;

my $h = 0 ;
my $i = 0 ;
my $j = 0 ;
my $k = 0 ;
my $x = '' ;
my $sql = '' ;
my @t = ( ) ;
my $lat = 35.12345 ;
my $lng = -90.54321 ;

$j = sprintf( "%0.0f", abs($lat * 10) ) ; # Latitude converted to 3 digits
$k = sprintf( "%0.0f", abs($lng * 10) ) ; # Longitude convered to 4 digits

# Reduce the starting point...
# Use 1: 3x3, 2: 5x5, 3: 7x7...

$j = $j - 1 ; # Subtract to the low latitude
$k = $k - 1 ; # Subtract to the low longitude

# Construct an array of search keys...
# Use 3: 9 key, 5: 25 key or 7: 49 key...
# Use 1000: 1 Degree, 10000: 1/10th, 100000: 1/100th...

for ( $i = 0; $i < 3; $i++ ) # Loop through latitude
{
for ( $h = 0; $h < 3; $h++ ) # Loop through longitude
{
push( @t, ($j + $i) * 10000 + ($k + $h) ) ; # Change 10000 based on 
precision
}
}

# Loop through search key array and construct a string with the "or" SQL 
statements...
# Use 9: 3x3, 25: 5x5 or 49: 7x7...

$x = '' ;
for ( $i = 0; $i < 9; $i++ )
{
$x .= " keybox = $t[$i] or" ;
}

# Trim off the last 3 characters ( or) from the string...

$x = substr($x, 0, length($x) - 3) ; # Trim off extra

# Create the select statement for point submitted in the polygon database...

$sql = "select * from mygeometrytable where ($x) and point '($lat,$lng)' @ 
poly" ;

print "$sql\n" ;


Here's the output from the sample program:

select * from mygeometrytable where ( keybox = 3500904 or keybox = 3500905 
or keybox = 3500906 or keybox = 3510904 or keybox = 3510905 or keybox = 
3510906 or keybox = 3520904 or keybox = 3520905 or keybox = 3520906 ) and 
point '(35.12345,-90.54321)' @ poly




-- 
You received this message because you are subscribed to the Google Groups 
"Google Maps API V2" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-maps-api/-/By_55c97hGkJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-maps-api?hl=en.

Reply via email to