LUCENE-2359 changed the best fit calculation. I admit, I'm not entirely certain which one is right, so I thought we should step back and talk about what we are trying to achieve.
Please correct me if/where I am wrong. Looking at the problem of tiers/tiles/grids in general, we are taking a sphere, projecting it into a 2D plane. Next, we are dividing up the plane into nested grids/tiers. Each tier contains 2^tier id boxes. Thus, tier level 2 divides the earth up into 4 boxes. 2^15 = 32,768 boxes. We then, for each box, give it a unique label which then becomes the token that we index. During indexing, we typically will index many tiers, i.e. tiers 4 through 15. During search, we take in a lat/lon and a radius. The goal is to do a search using the fewest terms possible. Thus, we need to pick the tier that contains/covers the radius with the fewest number of boxes so that we can enumerate a very small number of documents. Thus, we need to calculate the best fit, which is a method inside of the CartesianTierPlotter. In the old way, we did: bf = min( 15, ceil(log2( earth_circumference / ( ( miles/2) - sqrt( (miles/2)^2 / 2 ) ) ) + 1 ) // we won't go higher than 15 for accuracy reasons The new way is: bf' = ceil ( log2( earth_circumference / ( 2 * miles ) ) ) These are obviously two different calculations, never mind the min(15) issue, we can easily resolve that one. AFAICT, the new way is much less accurate, but will likely be faster. So, which is right? Unfortunately, I find almost zero documentation on this, probably b/c the nomenclature is off, but... -Grant --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
