I wrote...
Unfortunately, once the changes to TermBuffer, TermInfosWriter, and
StringHelper are applied, execution speed at index-time suffers a
slowdown of about 20%. Perhaps this can be blamed on all the calls
to getBytes("UTF-8") in TermInfosWriter? Maybe alternative
implementations using ByteBuffer, CharsetDecoder, and
CharsetEncoder are possible that can mitigate the problem?
Nope.
The version of writeTerm below is about the same speed as the one
with the calls to getBytes("UTF-8").
I think I'll take a crack at a custom charsToUTF8 converter algo.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
//----------------------------------------------------------------------
---
private final void writeTerm(Term term)
throws IOException {
byteBuf.clear();
while (true) {
CoderResult status = utf8Encoder.encode(CharBuffer.wrap
(term.text()),
byteBuf, false);
if (status.isOverflow()) {
bufSize += 32;
byteBuf = ByteBuffer.allocate(bufSize);
utf8Encoder.reset();
}
else {
break;
}
}
int totalLength = byteBuf.position();
int start = StringHelper.bytesDifference(lastByteBuf, byteBuf);
int length = totalLength - start;
output.writeVInt(start); // write shared
prefix length
output.writeVInt(length); // write delta length
byte[] bytes = byteBuf.array();
for (int i = start ; i < totalLength; i++) {
output.writeByte(bytes[i]); // write delta UTF-8
bytes
}
output.writeVInt(fieldInfos.fieldNumber(term.field)); // write
field num
lastTerm = term;
// swap byteBuf and lastByteBuf
scratchByteBuf = lastByteBuf;
lastByteBuf = byteBuf;
byteBuf = scratchByteBuf;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]