Hi,

My understanding of the benefits of the new algorithm is that it means a
lower tier level resulting in fewer boxes, but more documents inside those
boxes that are outside of the search radius.

While having fewer boxes means fewer term queries to make against the index,
more documents means more costly calculations to filter out those extraneous
documents.

For those doing just Cartesian Tier filtering it seems like the new approach
is a win, but for those doing distance calculations on those documents
passing the filter, it seems to come at a cost.

Cheers
Chris

On Wed, Apr 14, 2010 at 5:00 PM, Grant Ingersoll <gsing...@apache.org>wrote:

> LUCENE-2359 changed the best fit calculation.  I admit, I'm not entirely
> certain which one is right, so I thought we should step back and talk about
> what we are trying to achieve.
>
> Please correct me if/where I am wrong.
>
> Looking at the problem of tiers/tiles/grids in general, we are taking a
> sphere, projecting it into a 2D plane.  Next, we are dividing up the plane
> into nested grids/tiers.  Each tier contains 2^tier id boxes.  Thus, tier
> level 2 divides the earth up into 4 boxes.  2^15 = 32,768 boxes.  We then,
> for each box, give it a unique label which then becomes the token that we
> index.  During indexing, we typically will index many tiers, i.e. tiers 4
> through 15.
>
> During search, we take in a lat/lon and a radius.  The goal is to do a
> search using the fewest terms possible.  Thus, we need to pick the tier that
> contains/covers the radius with the fewest number of boxes so that we can
> enumerate a very small number of documents.  Thus, we need to calculate the
> best fit, which is a method inside of the CartesianTierPlotter.
>
> In the old way, we did:
>
> bf = min( 15, ceil(log2(  earth_circumference /  ( ( miles/2) - sqrt(
> (miles/2)^2  /  2 ) ) ) + 1 )   // we won't go higher than 15 for accuracy
> reasons
>
> The new way is:
>
> bf' = ceil ( log2( earth_circumference / ( 2 * miles ) ) )
>
> These are obviously two different calculations, never mind the min(15)
> issue, we can easily resolve that one.
>
> AFAICT, the new way is much less accurate, but will likely be faster.
>
> So, which is right?
>
> Unfortunately, I find almost zero documentation on this, probably b/c the
> nomenclature is off, but...
>
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl

Reply via email to