That method should easily be changed to
public final String readString() throws IOException {
int length = readVInt();
return new String(readBytes(length),"UTF-8);
}
readBytes(0 could reuse the same array if it was large enough. Then only the
single char[] is created in the String code.
-----Original Message-----
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 11:28 AM
To: [email protected]
Subject: Re: Lucene does NOT use UTF-8.
> How will the difference impact String memory allocations? Looking at the
> String code, I can't see where it would make an impact.
This is from Lucene InputStream:
public final String readString() throws IOException {
int length = readVInt();
if (chars == null || length > chars.length)
chars = new char[length];
readChars(chars, 0, length);
return new String(chars, 0, length);
}
If you know the length in bytes, you still have to allocate that many chars
(even though the number of chars may be less than the number of bytes). Not
a big deal IMHO.
A bigger pain is on the writing side, where you can't stream things because
you don't know what the length is going to be (in either bytes *or* UTF-8
chars).
So it turns out that Java's 16 bit chars were just a waste... it's still a
multibyte format *and* it takes up more space. UTF-8 would have been nice -
no conversions necessary.
-Yonik Now hiring -- http://tinyurl.com/7m67g
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]