[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290477#comment-13290477
 ] 

Adrien Grand commented on LUCENE-2357:
--------------------------------------

I ran a quick test that indexes a few millions of documents with only one field 
(index, not stored, not analyzed, no terms vectors, ...) with different ratios 
of deleted documents, ram buffer sizes (between 1 and 50 MB) and merge factors 
(between 3 and 20). The global speedup with {{PackedInts.FAST}} was between 
0.2% and 1.7% compared to {{PackedInts.COMPACT}} (although I ran this test on a 
low-end computer, other people might have slightly better results with the 
{{FAST}} version on a better machine). This is probably not worth the potential 
memory overhead. Would someone disagree to replace {{FAST}} with {{COMPACT}} 
for the docmaps instantiation?
                
> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2357
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2357
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to