Hi gents, Thanks Sherman for taking up the bat on this one - most appreciated.
These four methods encode as many characters as possible into the >>>> destination byte[] or buffer but don't give any indication that the >>>> destination didn't have enough space to encode the entire string. I thus >>>> worry they could be a hazard and result in buggy code. If there is >>>> insufficient space then the user of the API doesn't know how many >>>> characters were encoded so it's not easy to substring and call getBytes >>>> again to encode the remaining characters. There is also the issue of how to >>>> size the destination. What would you think about having them fail when >>>> there is insufficient space? If they do fail then there is a side effect >>>> that they will have written to the destination so that would need to be >>>> documented too. >>>> >>> >> I share Alan's concern here. >> >> If the intent is to reuse a byte[] or a ByteBuffer, then there needs to >> be an effective way to handle the case where the provided array/buffer >> doesn't have enough room to receive the decoded string. A variety of ways >> of dealing with this have been mentioned, such as throwing an exception; >> returning negative value to indicate failure, >> > > To add the ref for the discussion. Here is one of the alternatives being > discussed, > to return the "-length", negative value of the space needed to encode the > whole > String, to hint the caller to pass in an appropriately sized buffer (only > the String > APIs are updated). This appears to be a better than the proposed "partial > fill"/encode > as many as possible approach. But it will still force the caller to take > care of the > return value and manage the possible buffer size issue himself, which > triggers the > question that whether or not the CharsetDecoder is a better place for such > kind of > use scenario. > > http://cr.openjdk.java.net/~sherman/8021560/webrev.v2 > I think some of the context here around application level memory management of the byte buffers is missing. The getBytes() methods were supposed to be useful in scenarios where you have a String and you want to write the byte encoded version of it down some kind of connection. So you're going to take the byte[] or ByteBuffer and hand it off somewhere - either to a streaming protocol (eg: TCP), a framed message based protocol (eg: UDP, Aeron, etc.) or perhaps to a file. A common pattern for dealing with this kind of buffer is to avoid trying to allocate a new ByteBuffer for every message and to encode onto the existing buffers before writing them into something else, for example a NIO channel. The review is completely correct that the API's user needs to know when the ByteBuffer isn't large enough to deal with the encoding, but I think there are two strategies for expected API usage here if you run out of space. 1. Find out how much space you need, grow your buffer size and encode onto the bigger buffer. So this means that in the failure case the user ideally gets to know how big a buffer you need. I think this still works in terms of mitigating per message buffer allocation as in practice it means that you only allocate a larger buffer when a String is encoded that is longer than any previous String that you've seen before. It isn't strictly necessary to know how big a buffer is needed btw - as long as failure is indicated an API user could employ a strategy like double the buffer size and retry. I think that's suboptimal to say the least, however, and knowing how big a buffer needs to be is desirable. 2. Just write the bytes that you've encoded down the stream and retry with an offset incremented by the number of characters written. This requires that the getBytes() method encodes in terms of whole characters, rather than running out of space when encoding say a character that takes up multiple bytes encoded and also takes a "source offset" parameter - say the number of characters into the String that you are? This would work perfectly well in a streaming protocol. If your buffer size is N, you encode max N characters and write them down your Channel in a retry loop. Anyone dealing with async NIO is probably familiar with the concept of having a retry loop. It may also work perfectly well in a framed message based protocol. In practice any network protocol that has fixed-size framed messages and deals with arbitrary size encodings has to have a way to fragment longer-length blobs of data into its fixed size messages. I think either strategy for dealing with failure is valid, the problem is that if the API uses the return value to indicate failure, which I think is a good idea in a low-level performance oriented API then its difficult to offer both choices to the user. (1) needs the failure return code to be the number of bytes required for encoding. (2) needs the failure return code to indicate how far into the String you are in order to retry. I suspect given this tradeoff that Sherman's suggestion of using a -length (required number of bytes) return value is a good idea and just assuming API users only attempt (1) as a solution to the too-small-buffer failure. regards, Richard Warburton http://insightfullogic.com @RichardWarburto <http://twitter.com/richardwarburto>