[ 
https://issues.apache.org/jira/browse/LUCENE-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395531#comment-14395531
 ] 

David Smiley commented on LUCENE-5579:
--------------------------------------

I fixed a bug (some unfinished code I overlooked) and I finally see the 
performance numbers I've been expecting to see.  With the new approx/exact 
differentiated Intersects predicate, the benchmarked queries were ~83% faster 
compared to without.  YMMV a ton.  These shapes were all geodetic circles; 
which do have some trig but I bet a polygon, esp. a non-trivial polygon, should 
see more improvement.  This test used distErrPct=0.2 which will yield a tiny 
index & fast indexing but super-approximated shapes (very blocky looking).  By 
using distErrPct=0.1, the relative improvement became 100% (2x) since more 
detail allows more hits to be in the "exact" bucket.  The index increased in 
size 93% though.  Note even at 0.1, this index is about 1/4th the size of the 
default RPT configuration.

Now I need to wrap up the TODOs; including test a bit more.  Maybe re-think the 
name of this thing; although CompositeSpatialStrategy ain't bad.  Perhaps this 
could all go right into SerializedDVStrategy and then make this index portion 
being added here optional?  On the other hand... SerializedDVStrategy is but 
one specific way (BinaryDocValues) to retrieve the shape.  Granted we don't 
have any alternative similar nor do I plan to come up with one.  Or this code 
could go into RPT, so that you could optionally add the precision of the 
serialized geometry if you so choose.  Hmmm.

> Spatial, enhance RPT to differentiate confirmed from non-confirmed hits, then 
> validate with SDV
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5579
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5579
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>         Attachments: LUCENE-5579_CompositeSpatialStrategy.patch, 
> LUCENE-5579_SPT_leaf_covered.patch
>
>
> If a cell is within the query shape (doesn't straddle the edge), then you can 
> be sure that all documents it matches are a confirmed hit. But if some 
> documents are only on the edge cells, then those documents could be validated 
> against SerializedDVStrategy for precise spatial search. This should be 
> *much* faster than using RPT and SerializedDVStrategy independently on the 
> same search, particularly when a lot of documents match.
> Perhaps this'll be a new RPT subclass, or maybe an optional configuration of 
> RPT.  This issue is just for the Intersects predicate, which will apply to 
> Disjoint.  Until resolved in other issues, the other predicates can be 
> handled in a naive/slow way by creating a filter that combines RPT's filter 
> and SerializedDVStrategy's filter using BitsFilteredDocIdSet.
> One thing I'm not sure of is how to expose to Lucene-spatial users the 
> underlying functionality such that they can put other query/filters 
> in-between RPT and the SerializedDVStrategy.  Maybe that'll be done by simply 
> ensuring the predicate filters have this capability and are public.
> It would be ideal to implement this capability _after_ the PrefixTree term 
> encoding is modified to differentiate edge leaf-cells from non-edge leaf 
> cells. This distinction will allow the code here to make more confirmed 
> matches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to