[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284437#comment-13284437
 ] 

Adrien Grand commented on LUCENE-2357:
--------------------------------------

I implemented equals only for testing purposes (see TestSegmentMerger.java) and 
then hashCode for consistency. I can move the equals code to the test case if 
you prefer.

Regarding numDeletedDocs, I tried to add the following assert
{code}
assert docCount == reader.reader.numDocs() : "docCount=" + docCount + ", 
numDocs=" + reader.reader.numDocs();
{code}
to line 321 of SegmentMerger (before applying the patch) and it fails across a 
large number of tests (try to run TestAddIndexes a few times for example, and 
at least one of the {{testWithpendingDeletes*}} should fail). There used to be 
an assert in SegmentMerger but it was removed in r1148938 
(http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java?r1=1147671&r2=1148938&pathrev=1148938&diff_format=h)
 so I assumed the {{numDeletedDocs()}} was unreliable and the del count should 
be computed from {{liveDocs}}. I am not familiar enough with the merge process 
to know whether some invariants are broken or not. Should I open a bug?
                
> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2357
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2357
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 4.1
>
>         Attachments: LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to