On 05/07/2011 08:54 PM, Xueming Shen wrote:
On 05-07-2011 上午 9:00, Rémi Forax wrote:
On 05/07/2011 02:17 PM, Ulf Zibis wrote:
Hi all,
please excuse, that I have still problems to accept this additional
class, but +1 for the plural name.
If those charset constants are there, people _will use_ them without
respect on the existing _performance disadvantages_.
A common typical use case should be: String.getBytes(...)
On small strings there is a performance lost up to 25 % using the
charset variant vs. the charset name variant. See:
http://cr.openjdk.java.net/~sherman/7040220/client
http://markmail.org/message/2tbas5skgkve52mz
http://markmail.org/thread/lnrozcbnpcl5kmzs
So I still think, we should have the standard charset names as
constants in class j.n.c.Charset:
public static final String UTF_8 = "UTF-8"; etc...
Using objects instead of string is a better design.
I see the fact that the String method variants that takes a Charset
are slower that the ones that use a String
as a performance bug, not as a design issue.
The String method that takes a Charset should reuse the class-local
decoder
and the performance problem will go away.
(The analysis in StringCoding.decode(Charset, ...) (point 1) forget
that initializing a decoder has also a cost)
I do know the "slowness" is from initializing cs.newDe/Encoder():-)
But it is just not "easy" to cache
the de/encoder in this case. There is no guarantee that the cs passed
in this time is the same one
you had last time, even the name might be the same.
I agree. You can load a new charset this will override an existing one.
Or even the cs this time is indeed the same
instance you had last time (you did the cache), there is no guarantee
the dec/enc returned from
newDecoder()/Encoder() this time will be the same one in your cache,
until you invoke the
newDecoder()/Encoder() ,
Here, you don't care if it's the same encoder/decoder.
Any decoder/encoder will be valid, if the charset is the same (==) as
the thread-local one.
The spec allows to reuse encoder/decoder, so it's valid to reuse one
created from the same
charset. As I said, the javadoc offers no guarantee that we should
always call newEncoder/newDecoder
and even says that if you want control over the encoder/decoder you
should call newEncoder/newDecoder
yourself.
get the enc/dec and compare to the on in your cache, but then why cache
it:-) Something you can do is to do the cache if the cs passed in is
indeed the one from our own
charset repository (can be trusted that would not do something
tricky), you can do this by invoking
getClassLoader0() == null, which is expensive, I kinda remember the
measure showed this might
not be something worthing doing last time when I was there.
Yes, weirdly, getClassLoader() is not a VM intrinsic even if by example
getSuperClass() which is
far more complex than getClassLoader() is an intrinsic.
It should be a good idea to open a bug to create an intrinsic for
getClassLoader,
checking if a classloader of a class is null or not is a common pattern.
Sure, those charsets in
StandardChsets can be treated specially, if desirable, probably only
the ascii, iso8859-1 and utf8,
such as
if (cs == StandardCharsets.UTF_8 || cs == StandardCharsets.US_ASCII...) {
...
}
I think it's better to reuse the caching mechanism used if the charset
is a String.
Will do some measurement later to see if to separate the "else" part
in a side method will speed
up a little, we can do that if the inline does help, but not for 7:-)
Yes, not for 7, but 8 is around the corner.
Thanks!
-Sherman
cheers,
Rémi