Oh, and one bit of advice on lat/lon boxes -

The order the engine performs the lat & lon query - either lat then lon, or
lon then lat - matters.  Try swapping the terms in the queries.  You want
the engine to select the vertical stripe (lon between x and y) first in most
of the US, because that has the least chance of picking up other major metro
areas.  If you're really slick, though, you'll switch it up depending on the
area of the country, because that vertical stripe is murder on certain parts
of the East Coast.

(This all assumes you're in the US, but the principle is the same wherever
you are - check your map!)


Schnitz

On 1/31/07, Matt Schnitz <[EMAIL PROTECTED]> wrote:

I could go on for hours about this, but lemme see if I can summarize what
I know.

Search engines aren't really optimized to geographic queries (unless they
have a built-in geographic index feature).  You can make them work, but it's
gonna be a hack.

Several things to try:
 - What you described already, the lat / lon box idea.  The issue with
that is that it selects the lat first, the lon second.  By doing so, it's
like taking a vertical or horizontal stripe of the country and sticking it
in a temporary result set.  That temporary result set can be HUGE,
especially on the US East Coast.  Which is why this is slow.
- You could, instead, select all the zip codes or postal codes in the
area, first, then do your precise calculations .  That will be fast if and
only if Ferret can handle that many query terms at once.  Most search
engines can't, really, but still, this is usually a bit faster than the
first solution.  The hard part here is computing the set of zip codes you
want in the first place.
- You could limit individual queries to greater metro areas, first, then
do your precise calculations.  Two issues here: one, getting that data; two,
those areas have borders, so getting coverage is difficult at best.

Faster, still, is doing the zip code coverage area solution in a
database.  The reason this'd be faster is that you can take your set of zip
codes covering your search area, and join them to the table in question.
Joins are much faster than a list of search query terms.

Like I said, the true solution is geographic indexes.  Unfortunately,
Ferret doesn't have them.  Maybe Lucene does?  It's possible to fake
geographic indexes in a non-geographic engine, but it's really nasty math;
I'd only recommend that if you need to bleed every last ounce of speed out
of it.


Schnitz

On 1/31/07, Michael Moen <[EMAIL PROTECTED]> wrote:
>
>
> On Jan 31, 2007, at 6:21 AM, Bob Aman wrote:
>
> > And for the search query, so far, I've been using acts_as_ferret's
> > find_by_contents method.  But now I need to figure out how to take an
> > array of results from the range query, and only do the
> > find_by_contents
> > magic on just the entries in that Array.  So far, everything method
> > I've
> > thought of looks like it's going to have performance problems.
> >
> > Any suggestions?
>
> Bob- I don't know what would be involved in using this method with
> aaf, but I am using this method for a bounding box geo query. If you
> need a more strict radial search you can use a custom filter with
> 0.10.x.
>
> During index population I'm doing:
>
> doc << Field.new("latitude", latitude.to_f + 1000,
>    Field::Store::NO, Field::Index::UNTOKENIZED)
> doc << Field.new("longitude", longitude.to_f + 1000,
>    Field::Store::NO, Field::Index::UNTOKENIZED)
>
> and during the query I'm doing:
>
> query << "latitude:[#{box[:lat_min] + 1000} #{box[:lat_max] + 1000}]
> AND "
> query << "longitude:[#{box[:lon_min] + 1000} #{box[:lon_max] + 1000}]
> AND "
>
> I have helper methods outside of this scope to handle the min/max
> that I'm searching in. I also have a complete (yet untested)
> GeoFilter, but we aren't using the .10.x Ferret yet so I have no idea
> if it actually works.
>
> Michael-
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk
>


_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to