[
https://issues.apache.org/jira/browse/LUCENE-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799941#comment-13799941
]
Paul Elschot commented on LUCENE-5293:
--------------------------------------
Looking again at the benchmark on how to solve building an EF docidset without
knowing the number of values in advance, one solution would be to use a PFD
docidset for that because it builds quickly and it has good next() performance.
The next() will be used once through the set to build the final docidset to be
cached.
However an even better way might be to use one or more temporary long arrays to
store the incoming doc ids directly in FOR format, (without forming deltas and
without an index). This can be done because the maximum doc id value is known.
While storing the doc ids, one can switch to an FBS on the fly when the total
number of doc ids becomes too high. The existing PackedInts code should be a
nice fit for this.
Since allocating the long arrays takes time, one can start with one array of
say 1/512 of the maximum needed size, and continue into another (bigger) array
as long as necessary or until an FBS is preferable.
> Also use EliasFanoDocIdSet in CachingWrapperFilter
> --------------------------------------------------
>
> Key: LUCENE-5293
> URL: https://issues.apache.org/jira/browse/LUCENE-5293
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Paul Elschot
> Priority: Minor
> Attachments: LUCENE-5293.patch, LUCENE-5293.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]