[jira] [Updated] (LUCENE-7299) BytesRefHash.sort() should use radix sort?

Adrien Grand (JIRA) Tue, 24 May 2016 06:25:33 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-7299:
---------------------------------
    Attachment: LUCENE-7299.patch

Here is a patch. I ran a simple benchmark with 10M docs, a single thread, a 
single indexed field and a largish ram buffer of 500 MB. On random binary keys 
of length 16, indexing speed improved by 37%. On random ascii keys whose length 
is between 0 and 16, indexing speed improved by 34%.

This is not a realistic speedup since there is no merging involved (the ram 
buffer can hold everything), no stored fields, no doc values, etc. - but I 
think this is still interesting to be able to generate the inverted index more 
quickly for flushed segments. There will probably be a noticeable speedup with 
some setups that make heavy use of the inverted index of high-cardinality 
fields with large ram buffers.

> BytesRefHash.sort() should use radix sort?
> ------------------------------------------
>
>                 Key: LUCENE-7299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7299.patch
>
>
> Switching DocIdSetBuilder to radix sort helped make things significantly 
> faster. We should be able to do the same with BytesRefHash.sort()?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7299) BytesRefHash.sort() should use radix sort?

Reply via email to