This part is indeed quite tricky... I'll try to take a stab at it.

Paul Elschot wrote:

Op Friday 19 September 2008 17:05:29 schreef Michael McCandless:
Not quite, because how positions are encoded depends on whether any
payload appeared in that segment.

However, if 1) the input is a SegmentReader (since in general we can
merge any IndexReader), and 2) its format is "congruent" with the
format we are writing (ie both don't or do use the payloads format),
which ought to be true the vast majority of the time, then I think we
could simply copy bytes.  Since the next TermInfo tells us the
proxPointer where it begins, we know exactly how many bytes to copy.
I think this'd be a nice optimization!

I tried to find a way to do this, but I'm stuck at the point where
the proxPointer is needed from a TermInfo.
I got this far (uncompiled code, smi is the SegmentMergeInfo
that is currently merged):

   if (smi.reader instanceof SegmentReader) {
     SegmentReader inputReader = smi.reader;
     boolean readerStorePayloads =
inputReader.fieldInfos.fieldInfo(smi.term.field).storePayloads;
     if (storePayloads == readerStorePayloads) {
       // take the difference of the two prox pointers:
       int positionsLength = inputReader.tis. ... -  ...;
       // do a direct byte copy from inputReader to proxOutput:
       ... ;
     }
   }

but I could not find out how to get from the TermInfosReader
at inputReader.tis to the next prox pointer.

SegmentMerger never needs to index the positions by using a
proxPointer itself, as it accesses all positions serially. This leaves
me without an example on how to use proxPointer from a TermInfo.

Any tips on how to continue?

Regards,
Paul Elschot


Mike

Paul Elschot wrote:
I'm looking at the for loop in SegmentMerger.java at line 666,
which completely interprets the input positions/payloads for
an input term at a document.

The positions/payloads don't change when they merged, is that
correct? I'm wondering whether this loop could be replaced by a
direct copy from
the input postings to proxOutput.

Regards,
Paul Elschot

-------------------------------------------------------------------
-- To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to