Re: StandardCharset vs. StandardCharsets

Rémi Forax Sat, 07 May 2011 11:34:37 -0700

Hi Ulf,

the javadoc doesn't say explicitly that the result ofcharset.newDecoder() will be used,

so I don't see the point.


I even see that the last sentence:

"The||||<http://download.java.net/jdk7/docs/api/java/nio/charset/CharsetDecoder.html>CharsetDecoderclass should be used when more control over the decoding process isrequired."

as a way to say that it's ok to reuse a previously existing decoder.

Rémi


On 05/07/2011 07:55 PM, Ulf Zibis wrote:

Rémi, thanks for your feedback.

Am 07.05.2011 18:00, schrieb Rémi Forax:
On 05/07/2011 02:17 PM, Ulf Zibis wrote:
Hi all,
please excuse, that I have still problems to accept this additionalclass, but +1 for the plural name.
If those charset constants are there, people _will use_ them withoutrespect on the existing _performance disadvantages_.
A common typical use case should be: String.getBytes(...)
On small strings there is a performance lost up to 25 % using thecharset variant vs. the charset name variant. See:
http://cr.openjdk.java.net/~sherman/7040220/client
http://markmail.org/message/2tbas5skgkve52mz
http://markmail.org/thread/lnrozcbnpcl5kmzs
So I still think, we should have the standard charset names asconstants in class j.n.c.Charset:public static final String UTF_8 = "UTF-8"; etc...
Using objects instead of string is a better design.
I agree 50 %.
100 % would be to have:
    byte[] String.getBytes(CharsetEncoder encoder)
    String(byte[] bytes, CharsetDecoder decoder)
So for convenience in consequence we should introduce constants forCharsetDecoder's and CharsetEncoder's in j.n.c.StandardCharsets, whichwould result in 12 additional classes to be loaded and instatiated atone time, if only one of them becomes in use.
But anyway, it would be better to have the canonical names of thestandard charsets declared in 1 place, not in 3 (Charset,j.n.c.StandardCharsets, s.n.c.StandardCharsets)
I see the fact that the String method variants that takes a Charsetare slower that the ones that use a String
as a performance bug, not as a design issue.
The String method that takes a Charset should reuse the class-localdecoder
and the performance problem will go away.
(The analysis in StringCoding.decode(Charset, ...) (point 1) forgetthat initializing a decoder has also a cost)
Unfortunately this is not possible.
See following discussion (my last post from 26.03.2009 - 00:52 CET,unfortunately this was a private conversation):
Am 19.03.2009 20:02, Xueming Shen schrieb:
Ulf Zibis wrote:
Isn't there any way even to avoid instantiating new ..Array-X-coderfor each invocation of StringCoding.x-code(Charset cs, ...)?Method x-code(byte/char[]) seems to be threadsafe, if replacementisn't changed, so I suppose, we could cache the ..Array-X-coder.
no. an "external" charset can do whatever it likes, it might be stillthe same "object", the de/encoder it "creates" mightbe still the same "object' or looks like the same object you mighthave cahced, but do total different thing.
At first assumption user could think, that String#getBytes(byte[] buf,Charset cs) might be faster than String#getBytes(byte[] buf, Stringcsn), because he assumes, that Charset would be internally createdfrom csn.As this is only true for the first call, there should be a *note* inJavaDoc about cost of those methods in comparision. Don't forget(byte[] ...) constructor's JavaDoc too.
Secondly I think, that ASCII and ISO-8859-1 have high percentage hereespecially for CORBA applications, so why not have a fast shortcut inclass String without internally using Charset-X-coder likegetASCIIbytes() + getISO_8859_1Bytes(), or more general andsophisticated:
   int getBytes(byte[] buf, byte mask) {
       int j = 0;
       for (int i=0; i<values.length; i++, j++) {
           if (values[i] | mask == mask)
               buf[j] = (byte)values[i];
               continue;
           if (isHighSurrogate(values[i])
                i++;
           buf[j] = '?'; // or default replacement
       }
       return j;
   }

-Ulf

Re: StandardCharset vs. StandardCharsets

Reply via email to