[ 
https://issues.apache.org/jira/browse/LUCENE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-7254:
--------------------------------
    Attachment: LUCENE-7254.patch

Here is a patch with {{MatchingPoints}}. it tries to use all the stats we have 
for points to leave less performance on the table.

We can try to make it fancier later, but for now it:
1) decides up-front on sparsity, based on whether the field is sparse
2) computes cost/cardinality as 'counter' if the field is single-valued (which 
is exact), otherwise multiplies counter by 'docs per point' from field stats in 
the multi-valued case.

I see the following results in the geo benchmark:
{noformat}
boxquery (this is a 2-D PointRangeQuery): 63.4 QPS -> 85.2 QPS
distance query: 37.2 QPS -> 46.2 QPS
polygon query (n=5): 49.0 QPS -> 61.3 QPS
{noformat}


> DocIDSetBuilder is no good for points
> -------------------------------------
>
>                 Key: LUCENE-7254
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7254
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-7254.patch
>
>
> For the postings lists, I think this approach works well in dense cases (e.g. 
> whole DISI's are added, things are coming in order, etc).
> However in the points case, it holds back range performance significantly. 
> There are a couple of problems here:
> * expensive cardinality computation (this is a 2% hit) when its totally 
> unnecessary. we can use index statistics to help here.
> * lots of conditional stuff in add(). This includes growing checks / bitset 
> switching checks and so on (which happens even if you are smart and call 
> grow, but this stuff all adds up). 
> I dont think we should try to create a magical shared API that is both 
> efficient for postings lists of unstructured stuff and at the same time point 
> collection for structured fields, instead we should just do things 
> differently for points and iterate from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to