Re: JEP 254: Compact Strings - length limits
On Sep 6, 2016, at 2:18 PM, Tim Ellisonwrote: > > People stash all sorts of things in (immutable) Strings. Reducing the > limits in JDK9 seems like a regression. Was there any consideration to > using the older Java 8 StringCoding APIs for UTF-16 strings (already > highly perf tuned) and adding additional methods for compact strings > rather than rewriting everything as byte[]'s? It doesn't help now, but https://bugs.openjdk.java.net/browse/JDK-8161256 proposes a better way to stash immutable bits, CONSTANT_Data. (Caveat: Language bindings not yet included.) Eventually we'll get there. — John
Re: JEP 254: Compact Strings - length limits
On 9/6/16, 2:18 PM, Tim Ellison wrote: Do we have a real use case that impacted by this change? People stash all sorts of things in (immutable) Strings. Reducing the limits in JDK9 seems like a regression. Was there any consideration to using the older Java 8 StringCoding APIs for UTF-16 strings (already highly perf tuned) and adding additional methods for compact strings rather than rewriting everything as byte[]'s? Hi Tim, I'm sorry I don't get the idea of "using StringCoding APIs for UTF-16 strings", can you explain a little more in detail? We did try various approaches, byte[] + flag, byte[] + coder, coder, char[] + coder, etc) the current one appears to be the best so far based on our measurement. Regards, Sherman
Re: JEP 254: Compact Strings - length limits
On 06/09/16 19:04, Xueming Shen wrote: > On 9/6/16, 10:09 AM, Tim Ellison wrote: >> Has it been noted that while JEP 254 reduces the space occupied by one >> byte per character strings, moving from a char[] to byte[] >> representation universally means that the maximum length of a UTF-16 >> (two bytes per char) string is now halved? Hey Sherman, > Yes, it's a known "limit" given the nature of the approach. It is > not considered to be an "incompatible change", because the max length > the String class and the corresponding buffer/builder classes can > support is really an implementation details, not a spec requirement. Don't confuse spec compliance with compatibility. Of course, the JEP should not break the formal specified behavior of String etc, but the goal was to ensure that the implementation be compatible with prior behavior. As you know, there are many places where compatible behavior beyond the spec is important to maintain. > The conclusion from the discussion back then was this is something we > can trade off for the benefits we gain from the approach. Out of curiosity, where was that? I did search for previous discussion of this topic but didn't see it -- it may be just my poor search foo. > Do we have a real use case that impacted by this change? People stash all sorts of things in (immutable) Strings. Reducing the limits in JDK9 seems like a regression. Was there any consideration to using the older Java 8 StringCoding APIs for UTF-16 strings (already highly perf tuned) and adding additional methods for compact strings rather than rewriting everything as byte[]'s? Regards, Tim >> Since the goal is "preserving full compatibility", this has been missed >> by failing to allow for UTF-16 strings of length greater than >> Integer.MAX_VALUE / 2. >> >> Regards, >> Tim >> >> >
Re: JEP 254: Compact Strings - length limits
On 9/6/16, 12:58 PM, Charles Oliver Nutter wrote: On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen> wrote: Yes, it's a known "limit" given the nature of the approach. It is not considered to be an "incompatible change", because the max length the String class and the corresponding buffer/builder classes can support is really an implementation details, not a spec requirement. The conclusion from the discussion back then was this is something we can trade off for the benefits we gain from the approach. Do we have a real use case that impacted by this change? Well, doesn't this mean that any code out there consuming String data that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on OpenJDK 9? Yes, true. But arguably the code that uses huge length of String should have fallback code to handle the potential OOM exception, when the vm can't handle the size, as there is really no guarantee the vm can handle the > max_value/2 length of String. Not that such a case is a particularly good pattern, but I'm sure there's code out there doing it. On JRuby we routinely get bug reports complaining that we can't support strings larger than 2GB (and we have used byte[] for strings since 2006). That was a trade-off decision to make. Does JRuby have any better solution for such complain? ever consider to go back to use char[] to "fix" the problem? or some workaround such as to add another byte[] for example. btw, the single byte only string should work just fine :-) or :-( depends on the character set used. Sherman
Re: JEP 254: Compact Strings - length limits
On Sep 6, 2016, at 12:58 PM, Charles Oliver Nutterwrote: > > On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen > wrote: > >> Yes, it's a known "limit" given the nature of the approach. It is not >> considered >> to be an "incompatible change", because the max length the String class >> and >> the corresponding buffer/builder classes can support is really an >> implementation >> details, not a spec requirement. The conclusion from the discussion back >> then >> was this is something we can trade off for the benefits we gain from the >> approach. >> Do we have a real use case that impacted by this change? >> > > Well, doesn't this mean that any code out there consuming String data > that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on > OpenJDK 9? > > Not that such a case is a particularly good pattern, but I'm sure there's > code out there doing it. On JRuby we routinely get bug reports complaining > that we can't support strings larger than 2GB (and we have used byte[] for > strings since 2006). > > - Charlie The most basic scale requirement for strings is that they support class-file constants, which top out at a UTF8-length of 2**16. Lengths beyond that, to fill up the 'int' return value of String::length, are less well specified. FTR, we could have chosen char[], int[], or long[] (not byte[]) as the backing store for string data. With long[] we could have strings above 4G-chars. But it would have come with a perf. tax, since the T[].length field would need to be combined with an extra bit or two (from a flag byte) to complete the length. That's 2-3 extra instructions for loading a string length, or else a redundant length field. So it's a trade-off. Likewise, choosing a third format deepens branch depth in order to get to payload. Likewise, making the second format (of two) have a length field embedded in the payload section requires a conditional load or branch, in order to load the string length. Again, more instructions. The team has looked at 20 possibilities like these. The current design is fastest. I hope it flies. — John
Re: JEP 254: Compact Strings - length limits
On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shenwrote: > Yes, it's a known "limit" given the nature of the approach. It is not > considered > to be an "incompatible change", because the max length the String class > and > the corresponding buffer/builder classes can support is really an > implementation > details, not a spec requirement. The conclusion from the discussion back > then > was this is something we can trade off for the benefits we gain from the > approach. > Do we have a real use case that impacted by this change? > Well, doesn't this mean that any code out there consuming String data that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on OpenJDK 9? Not that such a case is a particularly good pattern, but I'm sure there's code out there doing it. On JRuby we routinely get bug reports complaining that we can't support strings larger than 2GB (and we have used byte[] for strings since 2006). - Charlie
Re: JEP 254: Compact Strings - length limits
On 9/6/16, 10:09 AM, Tim Ellison wrote: Has it been noted that while JEP 254 reduces the space occupied by one byte per character strings, moving from a char[] to byte[] representation universally means that the maximum length of a UTF-16 (two bytes per char) string is now halved? Hi Tim, Yes, it's a known "limit" given the nature of the approach. It is not considered to be an "incompatible change", because the max length the String class and the corresponding buffer/builder classes can support is really an implementation details, not a spec requirement. The conclusion from the discussion back then was this is something we can trade off for the benefits we gain from the approach. Do we have a real use case that impacted by this change? Thanks, Sherman Since the goal is "preserving full compatibility", this has been missed by failing to allow for UTF-16 strings of length greater than Integer.MAX_VALUE / 2. Regards, Tim