Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-13 Thread Stephen Colebourne
Is it necessary to add the "String csname" variants at this point in
Java's lifetime? Don't most people use instances of Charset now?

Stephen


On 13 February 2018 at 06:24, Xueming Shen  wrote:
> Hi,
>
> Please help review the proposal to add following constructors and methods in
> String
> class to take ByteBuffer as the input and output data buffer.
>
> public String(ByteBuffer bytes, Charset cs);
> public String(ByteBuffer bytes, String csname);
> public int getBytes(byte[] dst, int offset, Charset cs);
> public int getBytes(byte[] dst, int offset, String csname);
> public int getBytes(ByteBuffer bytes, Charset cs);
> public int getBytes(ByteBuffer bytes, Charset csn);
>
> This is based on the change proposed by Richard in jdk9 time frame [1]. But
> since
> we were fully occupied by compact string support work back then, this one
> did
> not make into the top jdk9 priority list.
>
> issue: https://bugs.openjdk.java.net/browse/JDK-8021560
> webrev: http://cr.openjdk.java.net/~sherman/8021560/webrev
>
> The implementation in StringCoding class is a little "duplicated" for
> various types
> of encoding/decoding variants. There is definitely room to further
> consolidate. But
> I would assume it might be better to keep the new code path separated from
> the
> existing "fully tuned" methods, for now, to avoid possible performance
> regression
> for existing String methods.
>
> Thanks,
> Sherman
>
> [1]
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-November/029672.html


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-13 Thread Alan Bateman

On 13/02/2018 06:24, Xueming Shen wrote:

Hi,

Please help review the proposal to add following constructors and 
methods in String

class to take ByteBuffer as the input and output data buffer.

public String(ByteBuffer bytes, Charset cs);
public String(ByteBuffer bytes, String csname);
These constructors looks good (for the parameter names then I assume you 
meant "src" rather than "bytes" here).



public int getBytes(byte[] dst, int offset, Charset cs);
public int getBytes(byte[] dst, int offset, String csname);
public int getBytes(ByteBuffer bytes, Charset cs);
public int getBytes(ByteBuffer bytes, Charset csn);
These four methods encode as many characters as possible into the 
destination byte[] or buffer but don't give any indication that the 
destination didn't have enough space to encode the entire string. I thus 
worry they could be a hazard and result in buggy code. If there is 
insufficient space then the user of the API doesn't know how many 
characters were encoded so it's not easy to substring and call getBytes 
again to encode the remaining characters. There is also the issue of how 
to size the destination. What would you think about having them fail 
when there is insufficient space? If they do fail then there is a side 
effect that they will have written to the destination so that would need 
to be documented too.


-Alan





Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-13 Thread David Lloyd
On Tue, Feb 13, 2018 at 2:41 AM, Alan Bateman  wrote:
> On 13/02/2018 06:24, Xueming Shen wrote:
>>
>> Hi,
>>
>> Please help review the proposal to add following constructors and methods
>> in String
>> class to take ByteBuffer as the input and output data buffer.
>>
>> public String(ByteBuffer bytes, Charset cs);
>> public String(ByteBuffer bytes, String csname);
>
> These constructors looks good (for the parameter names then I assume you
> meant "src" rather than "bytes" here).
>
>> public int getBytes(byte[] dst, int offset, Charset cs);
>> public int getBytes(byte[] dst, int offset, String csname);
>> public int getBytes(ByteBuffer bytes, Charset cs);
>> public int getBytes(ByteBuffer bytes, Charset csn);
>
> These four methods encode as many characters as possible into the
> destination byte[] or buffer but don't give any indication that the
> destination didn't have enough space to encode the entire string. I thus
> worry they could be a hazard and result in buggy code. If there is
> insufficient space then the user of the API doesn't know how many characters
> were encoded so it's not easy to substring and call getBytes again to encode
> the remaining characters. There is also the issue of how to size the
> destination. What would you think about having them fail when there is
> insufficient space? If they do fail then there is a side effect that they
> will have written to the destination so that would need to be documented
> too.

The ones that output to a ByteBuffer have more flexibility in that the
buffer position can be moved according to the number of bytes written,
but the method _could_ return the number of _chars_ actually written.
But this is not particularly useful without variants which accept an
offset into the string, unless it can be shown that
s.substring(coffs).getBytes(xxx) is reasonably efficient.

It might be better to shuffle this around a little and instead have a
Charset[Encoder].getBytes(int codePoint, byte[] b, int offs, int
len)/.getBytes(int codePoint, ByteBuffer buf) kind of thing which
returns the number of bytes or e.g. -1 or -count if there isn't enough
space in the target.  Then it would be less onerous for users to write
simple for-each-codepoint loops which encode as far as is reasonable
but no farther without too many error-handling gymnastics.

-- 
- DML


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-13 Thread Xueming Shen

On 2/13/18, 12:41 AM, Alan Bateman wrote:



public int getBytes(byte[] dst, int offset, Charset cs);
public int getBytes(byte[] dst, int offset, String csname);
public int getBytes(ByteBuffer bytes, Charset cs);
public int getBytes(ByteBuffer bytes, Charset csn);
These four methods encode as many characters as possible into the 
destination byte[] or buffer but don't give any indication that the 
destination didn't have enough space to encode the entire string. I 
thus worry they could be a hazard and result in buggy code. If there 
is insufficient space then the user of the API doesn't know how many 
characters were encoded so it's not easy to substring and call 
getBytes again to encode the remaining characters. There is also the 
issue of how to size the destination. What would you think about 
having them fail when there is insufficient space? If they do fail 
then there is a side effect that they will have written to the 
destination so that would need to be documented too.



Well It's the old "how to control/manage the en/decoding state" issue we 
have to deal
with for this kind of apis. I think we faced the similar issue when 
designing bae64 de/

encoder.

A possible solution is to return negative value to indicate the 
buffer/byte[] provided is
not big enough and its value is the length needed to de/encode the whole 
String. So
you can pass in a zero-remaining ByteBuffer or byte[0] to ask for the 
size needed,  or
we can go even further, take a NULL, like the old C str method to return 
the size needed,

though I don't think we want to go that far :-)

That said one of the motivations of these APIs is to easy the use of 
doing de/encoding
between ByteBuffer and String, the extra step to handle the "needed 
buffer size" appears
to be against the original goal. But the api can always say "please 
provide big-enough
buffer, otherwise the method returns the size needed in negative value 
and the passed

in buffer might be partially filled ..."

-Sherman



Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-15 Thread Stuart Marks

public String(ByteBuffer bytes, Charset cs);
public String(ByteBuffer bytes, String csname);


I think these constructors make good sense. They avoid an extra copy to an 
intermediate byte[].


One issue (also mentioned by Stephen Colebourne) is whether we need the csname 
overload. Arguably it's not needed if we have the Charset overload. And the 
csname overload throws UnsupportedEncodingException, which is checked. But the 
csname overload is apparently faster, since the decoder can be cached, and it's 
unclear when this can be remedied for the Charset case


I could go either way on this one.

**

I'd also suggest adding a CharBuffer constructor:

public String(CharBuffer cbuf)

This would be semantically equivalent to

public String(char[] value, int offset, int count)

except using the chars from the CharBuffer between the buffer's position and its 
limit.


**

Regarding the getBytes() overloads:


public int getBytes(byte[] dst, int offset, Charset cs);
public int getBytes(byte[] dst, int offset, String csname);
public int getBytes(ByteBuffer bytes, Charset cs);
public int getBytes(ByteBuffer bytes, Charset csn);



On 2/13/18, 12:41 AM, Alan Bateman wrote:
These four methods encode as many characters as possible into the destination 
byte[] or buffer but don't give any indication that the destination didn't 
have enough space to encode the entire string. I thus worry they could be a 
hazard and result in buggy code. If there is insufficient space then the user 
of the API doesn't know how many characters were encoded so it's not easy to 
substring and call getBytes again to encode the remaining characters. There is 
also the issue of how to size the destination. What would you think about 
having them fail when there is insufficient space? If they do fail then there 
is a side effect that they will have written to the destination so that would 
need to be documented too.


I share Alan's concern here.

If the intent is to reuse a byte[] or a ByteBuffer, then there needs to be an 
effective way to handle the case where the provided array/buffer doesn't have 
enough room to receive the decoded string. A variety of ways of dealing with 
this have been mentioned, such as throwing an exception; returning negative 
value to indicate failure, possibly also encoding the number of bytes written; 
or even allocating a fresh array or buffer of the proper size and returning that 
instead. The caller would have to check the return value and take care to handle 
all the cases properly. This is likely to be fairly error-prone.


This also raises the question in my mind of what these getBytes() methods are 
intended for.


On the one hand, they might be useful for the caller to manage its own memory 
allocation and reuse of arrays/buffers. If so, then it's necessary for 
intermediate results from partial processing to be handled properly. If the 
destination fills up, there needs to be a way to report how much of the input 
was consumed, so that a subsequent operation can pick up where the previous one 
left off. (This was one of David Lloyd's points.) If there's sufficient room in 
the destination, there needs to be a way to report this and how much space 
remains in the destination. One could contemplate adding all this information to 
the API. This eventually leads to


CharsetEncoder.encode(CharBuffer in, ByteBuffer out, boolean endOfInput)

which has all the necessary partial progress state in the buffers.

On the other hand, maybe the intent of these APIs is for convenience. I'd 
observe that String already has this method:


public byte[] getBytes(Charset)

which returns the decoded bytes in a newly allocated array of the proper size. 
This is pretty convenient. It doesn't let the caller reuse a destination array 
or buffer... but that's what brings in all the partial result edge cases.


Bottom line is that I'm not entirely sure of the use of these new getBytes() 
overloads. Maybe I've missed a use case where these work; if so, maybe somebody 
can describe it.


s'marks




Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-15 Thread Xueming Shen

On 2/15/18, 1:55 PM, Stuart Marks wrote:



On 2/13/18, 12:41 AM, Alan Bateman wrote:
These four methods encode as many characters as possible into the 
destination byte[] or buffer but don't give any indication that the 
destination didn't have enough space to encode the entire string. I 
thus worry they could be a hazard and result in buggy code. If there 
is insufficient space then the user of the API doesn't know how many 
characters were encoded so it's not easy to substring and call 
getBytes again to encode the remaining characters. There is also the 
issue of how to size the destination. What would you think about 
having them fail when there is insufficient space? If they do fail 
then there is a side effect that they will have written to the 
destination so that would need to be documented too.


I share Alan's concern here.

If the intent is to reuse a byte[] or a ByteBuffer, then there needs 
to be an effective way to handle the case where the provided 
array/buffer doesn't have enough room to receive the decoded string. A 
variety of ways of dealing with this have been mentioned, such as 
throwing an exception; returning negative value to indicate failure, 


To add the ref for the discussion. Here is one of the alternatives being 
discussed,
to return the "-length", negative value of the space needed to encode 
the whole
String, to hint the caller to pass in an appropriately sized buffer 
(only the String
APIs are updated). This appears to be a better than the proposed 
"partial fill"/encode
as many as possible approach. But it will still force the caller to take 
care of the
return value and manage the possible buffer size issue himself, which 
triggers the
question that whether or not the CharsetDecoder is a better place for 
such kind of

use scenario.

http://cr.openjdk.java.net/~sherman/8021560/webrev.v2

-Sherman




Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Ulf Zibis

Hi,

what is the difference of:
- class StringCoding
- class StringCoder
- class StringCoding.StringDecoder
- class StringCoding.StringEncoder

Why we need so much variants of redundant code? I think it would be 
useful to have some explanation in the javadoc.


Missing indentation in StringCoder.java :

 322 if (enc == null) {
 323 enc = cs.newEncoder()
 324 .onMalformedInput(CodingErrorAction.REPLACE)
 325 .onUnmappableCharacter(CodingErrorAction.REPLACE);
 326 }


-Ulf


Am 16.02.2018 um 00:25 schrieb Xueming Shen:


http://cr.openjdk.java.net/~sherman/8021560/webrev.v2

-Sherman







Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Xueming Shen

On 2/16/18, 5:58 AM, Ulf Zibis wrote:

Hi,

what is the difference of:
- class StringCoding
- class StringCoder
- class StringCoding.StringDecoder
- class StringCoding.StringEncoder

Why we need so much variants of redundant code? I think it would be 
useful to have some explanation in the javadoc.



Ulf, Ignore StringCoder for now. It's a experimenting version to replace
StringCoding, to reduce "some" redundant code.

-Sherman


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Alan Bateman

On 15/02/2018 21:55, Stuart Marks wrote:

:

I'd also suggest adding a CharBuffer constructor:

    public String(CharBuffer cbuf)

This would be semantically equivalent to

    public String(char[] value, int offset, int count)

except using the chars from the CharBuffer between the buffer's 
position and its limit.
CharBuffer::toString already does this so adding this constructor may 
not be as useful as it initially looks.


-Alan


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Stuart Marks

On 2/16/18 6:14 AM, Alan Bateman wrote:

On 15/02/2018 21:55, Stuart Marks wrote:

I'd also suggest adding a CharBuffer constructor:

    public String(CharBuffer cbuf)

This would be semantically equivalent to

    public String(char[] value, int offset, int count)

except using the chars from the CharBuffer between the buffer's position and 
its limit.


CharBuffer::toString already does this so adding this constructor may not be as 
useful as it initially looks.


Seems like an argument that this function is in the wrong location.

(Not joking.)

I think CharBuffer.toString() is actually quite obscure. I note that most of the 
Buffer classes have toString() methods that report the *status* of the buffer, e.g.


java.nio.HeapByteBuffer[pos=0 lim=1000 cap=1000]

Compared to other Buffers' toString() methods, CharBuffer is the outlier: its 
toString() produces only the contents of the CharBuffer without any 
meta-information. This is rather surprising.


s'marks


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Richard Warburton
Hi gents,

Thanks Sherman for taking up the bat on this one - most appreciated.

These four methods encode as many characters as possible into the
 destination byte[] or buffer but don't give any indication that the
 destination didn't have enough space to encode the entire string. I thus
 worry they could be a hazard and result in buggy code. If there is
 insufficient space then the user of the API doesn't know how many
 characters were encoded so it's not easy to substring and call getBytes
 again to encode the remaining characters. There is also the issue of how to
 size the destination. What would you think about having them fail when
 there is insufficient space? If they do fail then there is a side effect
 that they will have written to the destination so that would need to be
 documented too.

>>>
>> I share Alan's concern here.
>>
>> If the intent is to reuse a byte[] or a ByteBuffer, then there needs to
>> be an effective way to handle the case where the provided array/buffer
>> doesn't have enough room to receive the decoded string. A variety of ways
>> of dealing with this have been mentioned, such as throwing an exception;
>> returning negative value to indicate failure,
>>
>
> To add the ref for the discussion. Here is one of the alternatives being
> discussed,
> to return the "-length", negative value of the space needed to encode the
> whole
> String, to hint the caller to pass in an appropriately sized buffer (only
> the String
> APIs are updated). This appears to be a better than the proposed "partial
> fill"/encode
> as many as possible approach. But it will still force the caller to take
> care of the
> return value and manage the possible buffer size issue himself, which
> triggers the
> question that whether or not the CharsetDecoder is a better place for such
> kind of
> use scenario.
>
> http://cr.openjdk.java.net/~sherman/8021560/webrev.v2
>

I think some of the context here around application level memory management
of the byte buffers is missing. The getBytes() methods were supposed to be
useful in scenarios where you have a String and you want to write the byte
encoded version of it down some kind of connection. So you're going to take
the byte[] or ByteBuffer and hand it off somewhere - either to a streaming
protocol (eg: TCP), a framed message based protocol (eg: UDP, Aeron, etc.)
or perhaps to a file. A common pattern for dealing with this kind of buffer
is to avoid trying to allocate a new ByteBuffer for every message and to
encode onto the existing buffers before writing them into something else,
for example a NIO channel.

The review is completely correct that the API's user needs to know when the
ByteBuffer isn't large enough to deal with the encoding, but I think there
are two strategies for expected API usage here if you run out of space.

1. Find out how much space you need, grow your buffer size and encode onto
the bigger buffer. So this means that in the failure case the user ideally
gets to know how big a buffer you need. I think this still works in terms
of mitigating per message buffer allocation as in practice it means that
you only allocate a larger buffer when a String is encoded that is longer
than any previous String that you've seen before. It isn't strictly
necessary to know how big a buffer is needed btw - as long as failure is
indicated an API user could employ a strategy like double the buffer size
and retry. I think that's suboptimal to say the least, however, and knowing
how big a buffer needs to be is desirable.

2. Just write the bytes that you've encoded down the stream and retry with
an offset incremented by the number of characters written. This requires
that the getBytes() method encodes in terms of whole characters, rather
than running out of space when encoding say a character that takes up
multiple bytes encoded and also takes a "source offset" parameter - say the
number of characters into the String that you are? This would work
perfectly well in a streaming protocol. If your buffer size is N, you
encode max N characters and write them down your Channel in a retry loop.
Anyone dealing with async NIO is probably familiar with the concept of
having a retry loop. It may also work perfectly well in a framed message
based protocol. In practice any network protocol that has fixed-size framed
messages and deals with arbitrary size encodings has to have a way to
fragment longer-length blobs of data into its fixed size messages.

I think either strategy for dealing with failure is valid, the problem is
that if the API uses the return value to indicate failure, which I think is
a good idea in a low-level performance oriented API then its difficult to
offer both choices to the user. (1) needs the failure return code to be the
number of bytes required for encoding. (2) needs the failure return code to
indicate how far into the String you are in order to retry. I suspect given
this tradeoff that Sherman's suggest

Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Richard Warburton
Hi gents,

public String(ByteBuffer bytes, Charset cs);
>> public String(ByteBuffer bytes, String csname);
>>
>
> I think these constructors make good sense. They avoid an extra copy to an
> intermediate byte[].
>
> One issue (also mentioned by Stephen Colebourne) is whether we need the
> csname overload. Arguably it's not needed if we have the Charset overload.
> And the csname overload throws UnsupportedEncodingException, which is
> checked. But the csname overload is apparently faster, since the decoder
> can be cached, and it's unclear when this can be remedied for the Charset
> case
>
> I could go either way on this one.
>

Yes - the original motivation for the csname overload was the ability to
cache the decoder and also for consistency with other String encoding
methods that normally have both the String name and also the Charset. If
memory serves me correctly it was concluded that it was possible to cache
the decoder iff the decoder could be trusted to not hold a reference to the
formerly char[], now byte[], that sits within the String. Otherwise a rogue
decoder could create a mutable string by holding a reference to its
internal state for some malicious purpose. JDK charsets aren't rogue so it
would be possible to check whether a charset was a JDK one and perform the
caching, should that be a desirable + worthwhile optimization.

regards,

  Richard Warburton

  http://insightfullogic.com
  @RichardWarburto 


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Xueming Shen

On 02/16/2018 03:04 PM, Richard Warburton wrote:

Hi gents,

public String(ByteBuffer bytes, Charset cs);
public String(ByteBuffer bytes, String csname);


I think these constructors make good sense. They avoid an extra copy to an 
intermediate byte[].

One issue (also mentioned by Stephen Colebourne) is whether we need the 
csname overload. Arguably it's not needed if we have the Charset overload. And 
the csname overload throws UnsupportedEncodingException, which is checked. But 
the csname overload is apparently faster, since the decoder can be cached, and 
it's unclear when this can be remedied for the Charset case

I could go either way on this one.


Yes - the original motivation for the csname overload was the ability to cache 
the decoder and also for consistency with other String encoding methods that 
normally have both the String name and also the Charset. If memory serves me 
correctly it was concluded that it was possible to cache the decoder iff the 
decoder could be trusted to not hold a reference to the formerly char[], now 
byte[], that sits within the String. Otherwise a rogue decoder could create a 
mutable string by holding a reference to its internal state for some malicious 
purpose. JDK charsets aren't rogue so it would be possible to check whether a 
charset was a JDK one and perform the caching, should that be a desirable + 
worthwhile optimization.



Yes, we don't cache external cs now (and yes, we still have to make some 
defensive
copy her and there :-))

The StringCoder.java included in the v2 webrev is trying to cache the de/encoder
from jdk's own charset. So if thing goes well as expected (rounds of jmh 
measurement)
there should be no performance difference between csn and cs. In that case, 
should
we drop the public String(ByteBuffer, String) constructor, as Stephen suggested?
Though the caller then needs to get the charset somewhere, Charset.forName()
for example, themselves.

http://cr.openjdk.java.net/~sherman/8021560/StringCoder.java 


-sherman


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-16 Thread Alan Bateman

On 16/02/2018 21:56, Stuart Marks wrote:

:

Seems like an argument that this function is in the wrong location.

(Not joking.)

I think CharBuffer.toString() is actually quite obscure. I note that 
most of the Buffer classes have toString() methods that report the 
*status* of the buffer, e.g.


    java.nio.HeapByteBuffer[pos=0 lim=1000 cap=1000]

Compared to other Buffers' toString() methods, CharBuffer is the 
outlier: its toString() produces only the contents of the CharBuffer 
without any meta-information. This is rather surprising.
toString() is specified by CharSequence so the CharBuffer implementation 
has to return a String containing the characters in the buffer.


-Alan


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-17 Thread Alan Bateman

On 16/02/2018 22:55, Richard Warburton wrote:

:
I think some of the context here around application level memory management
of the byte buffers is missing. The getBytes() methods were supposed to be
useful in scenarios where you have a String and you want to write the byte
encoded version of it down some kind of connection. So you're going to take
the byte[] or ByteBuffer and hand it off somewhere - either to a streaming
protocol (eg: TCP), a framed message based protocol (eg: UDP, Aeron, etc.)
or perhaps to a file. A common pattern for dealing with this kind of buffer
is to avoid trying to allocate a new ByteBuffer for every message and to
encode onto the existing buffers before writing them into something else,
for example a NIO channel.

The review is completely correct that the API's user needs to know when the
ByteBuffer isn't large enough to deal with the encoding, but I think there
are two strategies for expected API usage here if you run out of space.

1. Find out how much space you need, grow your buffer size and encode onto
the bigger buffer. So this means that in the failure case the user ideally
gets to know how big a buffer you need. I think this still works in terms
of mitigating per message buffer allocation as in practice it means that
you only allocate a larger buffer when a String is encoded that is longer
than any previous String that you've seen before. It isn't strictly
necessary to know how big a buffer is needed btw - as long as failure is
indicated an API user could employ a strategy like double the buffer size
and retry. I think that's suboptimal to say the least, however, and knowing
how big a buffer needs to be is desirable.

2. Just write the bytes that you've encoded down the stream and retry with
an offset incremented by the number of characters written. This requires
that the getBytes() method encodes in terms of whole characters, rather
than running out of space when encoding say a character that takes up
multiple bytes encoded and also takes a "source offset" parameter - say the
number of characters into the String that you are? This would work
perfectly well in a streaming protocol. If your buffer size is N, you
encode max N characters and write them down your Channel in a retry loop.
Anyone dealing with async NIO is probably familiar with the concept of
having a retry loop. It may also work perfectly well in a framed message
based protocol. In practice any network protocol that has fixed-size framed
messages and deals with arbitrary size encodings has to have a way to
fragment longer-length blobs of data into its fixed size messages.

I think either strategy for dealing with failure is valid, the problem is
that if the API uses the return value to indicate failure, which I think is
a good idea in a low-level performance oriented API then its difficult to
offer both choices to the user. (1) needs the failure return code to be the
number of bytes required for encoding. (2) needs the failure return code to
indicate how far into the String you are in order to retry. I suspect given
this tradeoff that Sherman's suggestion of using a -length (required number
of bytes) return value is a good idea and just assuming API users only
attempt (1) as a solution to the too-small-buffer failure.

Just to add that the existing low-level / advanced API for this is 
CharsetEncoder. The CoderResult from an encode and the buffer positions 
means you know when there is overflow, the number of characters encoded, 
and how many bytes were added to the buffer. It also gives fine control 
on how encoding errors should be handled and you cache a CharsetEncoder 
to avoid some of the performance anomalies that come up in the Charset 
vs. charset name discussions. This is not an API that most developers 
will ever use directly but if the use-case is advanced cases (libraries 
or frameworks doing their own memory management as you mention above) 
then it might be an alternative to look at to avoid adding advanced 
use-case APIs to String. I don't think an encode(String, ByteBuffer) 
would look out of place although it would need a way to return the 
characters encoded count as part of the result.


-Alan.


Re: RFR: JDK-8021560,(str) String constructors that take ByteBuffer

2018-02-17 Thread Robert Muir
On Sat, Feb 17, 2018 at 4:05 AM, Alan Bateman  wrote:

> Just to add that the existing low-level / advanced API for this is
> CharsetEncoder. The CoderResult from an encode and the buffer positions
> means you know when there is overflow, the number of characters encoded, and
> how many bytes were added to the buffer. It also gives fine control on how
> encoding errors should be handled and you cache a CharsetEncoder to avoid
> some of the performance anomalies that come up in the Charset vs. charset
> name discussions.

> This is not an API that most developers will ever use
> directly but if the use-case is advanced cases (libraries or frameworks
>

Really? How else are you supposed to convert bytes <-> characters
reliably in java without using CharsetEncoder/Decoder? Its the only
way to get an exception when something is wrong, instead of silently
masking errors with replacement characters (the JDK seems to think its
doing people favors there, its not).