Hello,
I am looking at the following bug:
https://bugs.openjdk.java.net/browse/JDK-8230531
and hoping someone who is familiar with the encoder will clear things
out. As in the bug report, the method description reads:
--
Returns the maximum number of bytes that will be produced for each
character of input. This value may be used to compute the worst-case
size of the output buffer required for a given input sequence.
--
Initially I thought it would return the maximum number of encoded bytes
for an arbitrary input "char" value, i.e. a code unit of UTF-16
encoding. For example, any UTF-16 Charset (UTF-16, UTF-16BE, and
UTF-16LE) would return 2 from the method, as the code unit is a 16 bit
value. In reality, the encoder of UTF-16 Charset returns 4, which
accounts for the initial byte-order mark (2 bytes for a code unit, plus
size of the BOM). This is justifiable because it is meant to be the
worst case scenario, though. I believe this implementation has been
there since the inception of java.nio, i.e., JDK1.4.
Obviously I can clarify the spec of maxBytesPerChar() to account for the
conversion independent prefix (or suffix) bytes, such as BOM, but I am
not sure the initial intent of the method. If it intends to return pure
max bytes for a single input char, UTF-16 should also have been
returning 2. But in that case, caller would not be able to calculate the
worst case byte buffer size as in the bug report.
Naoto