[jira] [Commented] (LUCENE-7299) BytesRefHash.sort() should use radix sort?

Dawid Weiss (JIRA) Wed, 25 May 2016 08:14:04 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300197#comment-15300197
 ]


Dawid Weiss commented on LUCENE-7299:
-------------------------------------

No problem at all, Adrien. Sorry for not reading your patch in detail -- I just 
skimmed through quickly since it's something we had worked on before.

bq.  [...] except the redistribution logic and the fact that your imp has the 
ability to parallelize in a ForkJoinPool. I tried to replace the redistribution 
logic out of curiosity but performance was the same.

Both are needed because our sorted sets are much, much larger than typical 
Lucene buffers. When you have millions of (smallish) entries to sort the 
redistribution index took a lot of extra space -- it was never a performance 
win, it was a memory conservative strategy.

bq. I think it should be fine with BytesRefHash since it just returns a 
BytesRef that points to an internal structure rather than copying bytes. 

Again, this was a significant performance boost in our case because of the size 
of structures we sort -- we also didn't copy the content of strings, but even 
filling in the pointer and length in a reused "pointer-like" class (much like 
BytesRef) was quite costly. There is also a related issue of avoiding extra 
allocations in BytesRefHash that I filed a while ago -- if you're working on 
that piece of code you may be interested in looking at it (LUCENE-5854).

> BytesRefHash.sort() should use radix sort?
> ------------------------------------------
>
>                 Key: LUCENE-7299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: ByteBlockListSorter.java, LUCENE-7299.patch, 
> LUCENE-7299.patch
>
>
> Switching DocIdSetBuilder to radix sort helped make things significantly 
> faster. We should be able to do the same with BytesRefHash.sort()?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7299) BytesRefHash.sort() should use radix sort?

Reply via email to