Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

Martin Buchholz Mon, 22 Sep 2014 14:45:35 -0700

Much of the documentation (especially the early stuff when supplementary
characters were rarer/nonexistent) doesn't distinguish between "character
(codepoint)" and "char" clearly enough.  Fixing that in all the docs would
be a fine thing to do.


On Mon, Sep 22, 2014 at 2:34 PM, Mark Thomas <ma...@apache.org> wrote:

> On 22/09/2014 22:23, Martin Buchholz wrote:
> > I think you are mistaken. It's maxBytesPerChar, not maxBytesPerCodepoint!
>
> You are going to have to explain that some more. The Javadoc for
> CharsetEncoder.maxBytesPerChar() is explicit:
> <quote>
> Returns the maximum number of bytes that will be produced for each
> character of input.
> </quote>
>
> For UTF-8 that number is 4, not 3. A quick look at the source for the
> default UTF-8 encoder confirms that there are cases where it will output
> 4 bytes for a single input character.
>
> Mark
>
>
> >
> >
> > changeset:   3116:b44704ce8a08
> > user:        sherman
> > date:        2010-11-19 12:58 -0800
> > 6957230: CharsetEncoder.maxBytesPerChar() reports 4 for UTF-8; should be
> 3
> > Summary: changged utf-8's CharsetEncoder.maxBytesPerChar to 3
> > Reviewed-by: alanb
> >
> >
> > On Mon, Sep 22, 2014 at 1:14 PM, Ivan Gerasimov <
> ivan.gerasi...@oracle.com>
> > wrote:
> >
> >> Hello!
> >>
> >> The UTF-8 encoding allows characters that are 4 bytes long.
> >> However, CharsetEncoder.maxBytesPerChar() currently returns 3.0, which
> is
> >> not always enough.
> >>
> >> Would you please review the simple fix for this issue?
> >>
> >> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058875
> >> WEBREV: http://cr.openjdk.java.net/~igerasim/8058875/0/webrev/
> >>
> >> Sincerely yours,
> >> Ivan
> >>
>
>

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

Reply via email to