[
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284437#comment-13284437
]
Adrien Grand commented on LUCENE-2357:
--------------------------------------
I implemented equals only for testing purposes (see TestSegmentMerger.java) and
then hashCode for consistency. I can move the equals code to the test case if
you prefer.
Regarding numDeletedDocs, I tried to add the following assert
{code}
assert docCount == reader.reader.numDocs() : "docCount=" + docCount + ",
numDocs=" + reader.reader.numDocs();
{code}
to line 321 of SegmentMerger (before applying the patch) and it fails across a
large number of tests (try to run TestAddIndexes a few times for example, and
at least one of the {{testWithpendingDeletes*}} should fail). There used to be
an assert in SegmentMerger but it was removed in r1148938
(http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/java/org/apache/lucene/index/SegmentMerger.java?r1=1147671&r2=1148938&pathrev=1148938&diff_format=h)
so I assumed the {{numDeletedDocs()}} was unreliable and the del count should
be computed from {{liveDocs}}. I am not familiar enough with the merge process
to know whether some invariants are broken or not. Should I open a bug?
> Reduce transient RAM usage while merging by using packed ints array for docID
> re-mapping
> ----------------------------------------------------------------------------------------
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int
> array... and maybe instead of storing abs docID in the mapping, we could
> store the number of del docs seen so far (so the remap would do a lookup then
> a subtract). This may add some CPU cost to merging but should bring down
> transient RAM usage quite a bit.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]