[
https://issues.apache.org/jira/browse/LUCENE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300197#comment-15300197
]
Dawid Weiss commented on LUCENE-7299:
-------------------------------------
No problem at all, Adrien. Sorry for not reading your patch in detail -- I just
skimmed through quickly since it's something we had worked on before.
bq. [...] except the redistribution logic and the fact that your imp has the
ability to parallelize in a ForkJoinPool. I tried to replace the redistribution
logic out of curiosity but performance was the same.
Both are needed because our sorted sets are much, much larger than typical
Lucene buffers. When you have millions of (smallish) entries to sort the
redistribution index took a lot of extra space -- it was never a performance
win, it was a memory conservative strategy.
bq. I think it should be fine with BytesRefHash since it just returns a
BytesRef that points to an internal structure rather than copying bytes.
Again, this was a significant performance boost in our case because of the size
of structures we sort -- we also didn't copy the content of strings, but even
filling in the pointer and length in a reused "pointer-like" class (much like
BytesRef) was quite costly. There is also a related issue of avoiding extra
allocations in BytesRefHash that I filed a while ago -- if you're working on
that piece of code you may be interested in looking at it (LUCENE-5854).
> BytesRefHash.sort() should use radix sort?
> ------------------------------------------
>
> Key: LUCENE-7299
> URL: https://issues.apache.org/jira/browse/LUCENE-7299
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: ByteBlockListSorter.java, LUCENE-7299.patch,
> LUCENE-7299.patch
>
>
> Switching DocIdSetBuilder to radix sort helped make things significantly
> faster. We should be able to do the same with BytesRefHash.sort()?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]