Re: possible segment merge improvement?

robert engels Wed, 31 Oct 2007 22:06:42 -0800

It seems that the following are needed:

FieldInfos.hashCode(); // to allow for fast equals failure
FieldInfos.equals();


for most efficient buffer reuse during merge to avoid GC, add

int FieldsReader.doclength(int doc);
int FieldsReader.binarydoc(int doc,byte[] buffer);

this will allow the caller to reuse the existing buffer if largeenough, or create a new one


and

FieldsWriter.addBinaryDocument(byte[] buffer,int len);

All of the above methods are trivial.

SegmentMerger just needs to be changed to compare the readers to bemerged, and if all have equal FieldInfos, then use a short circuitcopy similar to


byte[] buffer = new byte[1024];

for each reader {
    for doc in reader {
            if doc not deleted {
                int len = reader.doclength(doc);
                if(len > buffer.length) {
                        buffer = new byte[len*2]; // allow for growth
                }
                reader.binarydoc(doc,buffer);
                newsegment.addBinaryDocument(buffer,len);
          }
    }
}



On Nov 1, 2007, at 12:30 AM, jian chen wrote:

Hi, Robert,

That's a brilliant idea! Thanks so much for suggesting that.

Cheers,

Jian

On 10/31/07, robert engels <[EMAIL PROTECTED]> wrote:


Currently, when merging segments, every document is [parsed and then
rewritten since the field numbers may differ between the segments
(compressed data is not uncompressed in the latest versions).

It would seem that in many (if not most) Lucene uses the fields
stored within each document with an index are relatively static,
probably changing for all documents added after point X, if at all.

Why not check the fields dictionary for the segments being merged,
and if the same, just copy the binary data directly?

In the common case this should be a vast improvement.

Anyone worked on anything like this? Am I missing something?

Robert Engels



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: possible segment merge improvement?

Reply via email to