LUCENE-2359 changed the best fit calculation.  I admit, I'm not entirely 
certain which one is right, so I thought we should step back and talk about 
what we are trying to achieve.

Please correct me if/where I am wrong.

Looking at the problem of tiers/tiles/grids in general, we are taking a sphere, 
projecting it into a 2D plane.  Next, we are dividing up the plane into nested 
grids/tiers.  Each tier contains 2^tier id boxes.  Thus, tier level 2 divides 
the earth up into 4 boxes.  2^15 = 32,768 boxes.  We then, for each box, give 
it a unique label which then becomes the token that we index.  During indexing, 
we typically will index many tiers, i.e. tiers 4 through 15.

During search, we take in a lat/lon and a radius.  The goal is to do a search 
using the fewest terms possible.  Thus, we need to pick the tier that 
contains/covers the radius with the fewest number of boxes so that we can 
enumerate a very small number of documents.  Thus, we need to calculate the 
best fit, which is a method inside of the CartesianTierPlotter.

In the old way, we did:

bf = min( 15, ceil(log2(  earth_circumference /  ( ( miles/2) - sqrt( 
(miles/2)^2  /  2 ) ) ) + 1 )   // we won't go higher than 15 for accuracy 
reasons

The new way is:

bf' = ceil ( log2( earth_circumference / ( 2 * miles ) ) )

These are obviously two different calculations, never mind the min(15) issue, 
we can easily resolve that one.

AFAICT, the new way is much less accurate, but will likely be faster.

So, which is right?  

Unfortunately, I find almost zero documentation on this, probably b/c the 
nomenclature is off, but...

-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to