[ 
https://issues.apache.org/jira/browse/LUCENE-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614654#comment-14614654
 ] 

Adrien Grand commented on LUCENE-6645:
--------------------------------------

Thanks for having a look!

bq. Not sure it will help that much, since most of the cells are visited via 
addAll...

Indeed I did not try to optimize here since my profiler did not see it as a 
hotspot.

bq. Maybe we should try to contain the added hairiness to BKDTreeReader instead 
of DocIdSetBuilder if indeed this is the only user of this API that is so 
strange (or of tons of tiny sorted docID blocks)

I agree the API is a bit crazy now. :) It was a way to avoid checking the array 
length on every call to add(). I'll see how much it costs to remove this 
'Adder' class in favour of a grow() method . However, adding tons of tiny 
sorted doc ID blocks is something that MultiTermQueries can do all the time too 
so I'm quite happy that your benchmark revealed this inefficiency.

> BKD tree queries should use BitDocIdSet.Builder
> -----------------------------------------------
>
>                 Key: LUCENE-6645
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6645
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-6645.patch, LUCENE-6645.patch
>
>
> When I was iterating on BKD tree originally I remember trying to use this 
> builder (which makes a sparse bit set at first and then upgrades to dense if 
> enough bits get set) and being disappointed with its performance.
> I wound up just making a FixedBitSet every time, but this is obviously 
> wasteful for small queries.
> It could be the perf was poor because I was always .or'ing in DISIs that had 
> 512 - 1024 hits each time (the size of each leaf cell in the BKD tree)?  I 
> also had to make my own DISI wrapper around each leaf cell... maybe that was 
> the source of the slowness, not sure.
> I also sort of wondered whether the SmallDocSet in spatial module (backed by 
> a SentinelIntSet) might be faster ... though it'd need to be sorted in the 
> and after building before returning to Lucene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to