[jira] [Commented] (LUCENE-4157) Improve Spatial Testing

Chris Male (JIRA) Wed, 27 Jun 2012 01:51:49 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402063#comment-13402063
 ]


Chris Male commented on LUCENE-4157:
------------------------------------

{quote}RE QuadPrefixTree, I'll see if I can reproduce your test errors. I'm not 
surprised if the QuadPrefixTree.MAX_LEVELS_POSSIBLE is perhaps too big (notice 
the comment at it's declaration "not really sure how big this should be". 
Assuming the default 12 levels pass, I think we can find a safer max number to 
use for the time being that is less than 50, and maybe one day when we have 
time we can confidently determine exactly what it can support. I venture to 
guess it might be similar to the mantissa of a double which is 53, but perhaps 
not or maybe it's half that or something. FYI about 26 is needed for ~1meter 
accuracy. If a non-geo scenario is needed, then who knows what your 
requirements might be.{quote}

Thanks for that explanation.  I tried with the default of 12 and the tests 
still failed but no error this time.  That could be just related to the fact 
quad trees are less precise than geohashes or maybe some problems with the 
tests.  I think we should just try to come up with some tests for the trees 
themselves to verify that they work as expected.  I see SpatialPrefixTreeTest 
does some testing of GeohashPrefixTree currently, but we should really spin 
that off into its own test class and take QuadTree separately.

{quote}RE Testing of TermQueryPrefixGridStrategy, I agree that its tests are 
too minimal, in Lucene spatial. FWIW, I'm about to update a patch to SOLR-3304 
that tests a variety of strategies against the same test code (based on test 
code from Solr 3 spatial filter tests). TermQueryPrefixGridStrategy passes 
fine.{quote}

Good to know.  I have confidence in TermQueryPrefixGridStrategy since it is 
extremely simple but I think we need to come up with tests to ensure that any 
changes we make to the indexing process is compatible with the querying.

{quote}I definitely welcome any input on making the tests better overall. It's 
a bit of a challenge because there are a variety of strategies, and some like 
TwoDoublesStrategy are known to not yet support certain geo cases like the 
poles (if I recall). I'm not sure if the idea of a test file of query cases was 
your idea or Ryan's (e.g. cities-IsWithin-BBox), but instead or in addition, I 
like the idea of automatically generating random data and queries, and then 
double checking search results against a simple brute force algorithm.{quote}

I don't really like the test file idea at all.  Having them for benchmarking is 
good but we aren't at that stage yet.  Instead I think we should construct 
simple unit tests, indexing a few Shapes and querying for them.  We should do 
that for each Strategy, obviously only indexing Points for TwoDoublesStrategy.  
Having random data and query generation can come later, once we have enough 
crafted tests to be sure that this works.

We should then randomize the use of QuadTree vs GeohashTree or actually repeat 
the tests for both.

We have a big question mark around testing with polygons.  My concern is that 
users will rightly start using JTS Geometrys and our Strategies will fail.  We 
really need to think about how to handle this.

{quote}If you don't feel any better about these two classes, then I like your 
suggestion of not releasing them in 4.0 and leave in trunk.{quote}

QuadTree is my main concern since I don't know whether it is working correctly 
and is just less precise than geohashes or has a bug.  If we can't quickly come 
up with a couple of tests and fix any broken behavior then we should remove it 
from 4.0.  

We should also take this opportunity to remove any unused code / code that 
doesn't actually test anything.  For this I see TruncateFilter, the current 
TestTermQueryPrefixGridStrategy and TestSpatialPrefixField.

I'll try to help out here especially with cleaning out the dead code, but any 
help with testing QuadTree would be great.



                
> Improve Spatial Testing
> -----------------------
>
>                 Key: LUCENE-4157
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4157
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Critical
>             Fix For: 4.0
>
>         Attachments: LUCENE-4157_Improve_Lucene_Spatial_testing_p1.patch
>
>
> Looking back at the tests for the Lucene Spatial Module, they seem 
> half-baked.  (At least Spatial4j is well tested).  I've started working on 
> some improvements:
> * Some tests are in an abstract base class which have a subclass that 
> provides a SpatialContext. The idea was that the same tests could test other 
> contexts (such as geo vs not or different distance calculators (haversine vs 
> vincenty) but this can be done using RandomizedTesting's nifty parameterized 
> test feature, once there is a need to do this.
> * Port the complex geohash recursive prefix tree test that was developed on 
> the Solr side to the Lucene side where it belongs.
> And some things are not tested or aren't well tested:
> * Distance order as the query score
> * Indexing shapes other than points (i.e. shapes with area / regions)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4157) Improve Spatial Testing

Reply via email to