I wrote...

Unfortunately, once the changes to TermBuffer, TermInfosWriter, and StringHelper are applied, execution speed at index-time suffers a slowdown of about 20%. Perhaps this can be blamed on all the calls to getBytes("UTF-8") in TermInfosWriter? Maybe alternative implementations using ByteBuffer, CharsetDecoder, and CharsetEncoder are possible that can mitigate the problem?

Nope.

The version of writeTerm below is about the same speed as the one with the calls to getBytes("UTF-8").

I think I'll take a crack at a custom charsToUTF8 converter algo.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

//---------------------------------------------------------------------- ---

  private final void writeTerm(Term term)
       throws IOException {
    byteBuf.clear();
    while (true) {
CoderResult status = utf8Encoder.encode(CharBuffer.wrap (term.text()),
        byteBuf, false);
      if (status.isOverflow()) {
        bufSize += 32;
        byteBuf = ByteBuffer.allocate(bufSize);
        utf8Encoder.reset();
      }
      else {
        break;
      }
    }
    int totalLength = byteBuf.position();
    int start = StringHelper.bytesDifference(lastByteBuf, byteBuf);
    int length = totalLength - start;

output.writeVInt(start); // write shared prefix length
    output.writeVInt(length);                  // write delta length

    byte[] bytes = byteBuf.array();
    for (int i = start ; i < totalLength; i++) {
output.writeByte(bytes[i]); // write delta UTF-8 bytes
    }
output.writeVInt(fieldInfos.fieldNumber(term.field)); // write field num

    lastTerm = term;
    // swap byteBuf and lastByteBuf
    scratchByteBuf = lastByteBuf;
    lastByteBuf = byteBuf;
    byteBuf = scratchByteBuf;
  }



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to