[ http://issues.apache.org/jira/browse/LUCENE-738?page=all ]

Doron Cohen updated LUCENE-738:
-------------------------------

    Attachment: FileFormatDoc.patch.txt

FileFormat document updated to reflect this format change.

> read/write .del as d-gaps when the deleted bit vector is sufficiently sparse
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-738
>                 URL: http://issues.apache.org/jira/browse/LUCENE-738
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>         Attachments: del.dgap.patch.txt, FileFormatDoc.patch.txt
>
>
> .del file of a segment maintains info on deleted documents in that segment. 
> The file exists only for segments having deleted docs, so it does not exists 
> for newly created segments (e.g. resulted from merge). Each time closing an 
> index reader that deleted any document, the .del file is rewritten. In fact, 
> since the lock-less commits change a new (generation of) .del file is created 
> in each such occasion.
> For small indexes there is no real problem with current situation. But for 
> very large indexes, each time such an index reader is closed, creating such 
> new bit-vector seems like unnecessary overhead in cases that the bit vector 
> is sparse (just a few docs were deleted). For instance, for an index with a 
> segment of 1M docs, the sequence: {open reader; delete 1 doc from that 
> segment; close reader;} would write a file of ~128KB. Repeat this sequence 8 
> times: 8 new files of total size of 1MB are written to disk.
> Whether this is a bottleneck or not depends on the application deletes 
> pattern, but for the case that deleted docs are sparse, writing just the 
> d-gaps would save space and time. 
> I have this (simple) change to BitVector running and currently trying some 
> performance tests to, yet, convince myself on the worthiness of this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to