[ http://issues.apache.org/jira/browse/LUCENE-738?page=comments#action_12456239 ] Yonik Seeley commented on LUCENE-738: -------------------------------------
Did a quick code review, everything looks good to me. +1 > read/write .del as d-gaps when the deleted bit vector is sufficiently sparse > ---------------------------------------------------------------------------- > > Key: LUCENE-738 > URL: http://issues.apache.org/jira/browse/LUCENE-738 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Affects Versions: 2.1 > Reporter: Doron Cohen > Assigned To: Doron Cohen > Attachments: del.dgap.patch.txt > > > .del file of a segment maintains info on deleted documents in that segment. > The file exists only for segments having deleted docs, so it does not exists > for newly created segments (e.g. resulted from merge). Each time closing an > index reader that deleted any document, the .del file is rewritten. In fact, > since the lock-less commits change a new (generation of) .del file is created > in each such occasion. > For small indexes there is no real problem with current situation. But for > very large indexes, each time such an index reader is closed, creating such > new bit-vector seems like unnecessary overhead in cases that the bit vector > is sparse (just a few docs were deleted). For instance, for an index with a > segment of 1M docs, the sequence: {open reader; delete 1 doc from that > segment; close reader;} would write a file of ~128KB. Repeat this sequence 8 > times: 8 new files of total size of 1MB are written to disk. > Whether this is a bottleneck or not depends on the application deletes > pattern, but for the case that deleted docs are sparse, writing just the > d-gaps would save space and time. > I have this (simple) change to BitVector running and currently trying some > performance tests to, yet, convince myself on the worthiness of this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]