On 22/09/2014 22:23, Martin Buchholz wrote: > I think you are mistaken. It's maxBytesPerChar, not maxBytesPerCodepoint!
You are going to have to explain that some more. The Javadoc for CharsetEncoder.maxBytesPerChar() is explicit: <quote> Returns the maximum number of bytes that will be produced for each character of input. </quote> For UTF-8 that number is 4, not 3. A quick look at the source for the default UTF-8 encoder confirms that there are cases where it will output 4 bytes for a single input character. Mark > > > changeset: 3116:b44704ce8a08 > user: sherman > date: 2010-11-19 12:58 -0800 > 6957230: CharsetEncoder.maxBytesPerChar() reports 4 for UTF-8; should be 3 > Summary: changged utf-8's CharsetEncoder.maxBytesPerChar to 3 > Reviewed-by: alanb > > > On Mon, Sep 22, 2014 at 1:14 PM, Ivan Gerasimov <ivan.gerasi...@oracle.com> > wrote: > >> Hello! >> >> The UTF-8 encoding allows characters that are 4 bytes long. >> However, CharsetEncoder.maxBytesPerChar() currently returns 3.0, which is >> not always enough. >> >> Would you please review the simple fix for this issue? >> >> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058875 >> WEBREV: http://cr.openjdk.java.net/~igerasim/8058875/0/webrev/ >> >> Sincerely yours, >> Ivan >>