I could go on for hours about this, but lemme see if I can summarize what I
know.
Search engines aren't really optimized to geographic queries (unless they
have a built-in geographic index feature). You can make them work, but it's
gonna be a hack.
Several things to try:
- What you described already, the lat / lon box idea. The issue with that
is that it selects the lat first, the lon second. By doing so, it's like
taking a vertical or horizontal stripe of the country and sticking it in a
temporary result set. That temporary result set can be HUGE, especially on
the US East Coast. Which is why this is slow.
- You could, instead, select all the zip codes or postal codes in the area,
first, then do your precise calculations . That will be fast if and only if
Ferret can handle that many query terms at once. Most search engines can't,
really, but still, this is usually a bit faster than the first solution.
The hard part here is computing the set of zip codes you want in the first
place.
- You could limit individual queries to greater metro areas, first, then do
your precise calculations. Two issues here: one, getting that data; two,
those areas have borders, so getting coverage is difficult at best.
Faster, still, is doing the zip code coverage area solution in a database.
The reason this'd be faster is that you can take your set of zip codes
covering your search area, and join them to the table in question. Joins
are much faster than a list of search query terms.
Like I said, the true solution is geographic indexes. Unfortunately, Ferret
doesn't have them. Maybe Lucene does? It's possible to fake geographic
indexes in a non-geographic engine, but it's really nasty math; I'd only
recommend that if you need to bleed every last ounce of speed out of it.
Schnitz
On 1/31/07, Michael Moen <[EMAIL PROTECTED]> wrote:
On Jan 31, 2007, at 6:21 AM, Bob Aman wrote:
> And for the search query, so far, I've been using acts_as_ferret's
> find_by_contents method. But now I need to figure out how to take an
> array of results from the range query, and only do the
> find_by_contents
> magic on just the entries in that Array. So far, everything method
> I've
> thought of looks like it's going to have performance problems.
>
> Any suggestions?
Bob- I don't know what would be involved in using this method with
aaf, but I am using this method for a bounding box geo query. If you
need a more strict radial search you can use a custom filter with
0.10.x.
During index population I'm doing:
doc << Field.new("latitude", latitude.to_f + 1000,
Field::Store::NO, Field::Index::UNTOKENIZED)
doc << Field.new("longitude", longitude.to_f + 1000,
Field::Store::NO, Field::Index::UNTOKENIZED)
and during the query I'm doing:
query << "latitude:[#{box[:lat_min] + 1000} #{box[:lat_max] + 1000}]
AND "
query << "longitude:[#{box[:lon_min] + 1000} #{box[:lon_max] + 1000}]
AND "
I have helper methods outside of this scope to handle the min/max
that I'm searching in. I also have a complete (yet untested)
GeoFilter, but we aren't using the .10.x Ferret yet so I have no idea
if it actually works.
Michael-
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk