Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread John Rose
On Sep 6, 2016, at 2:18 PM, Tim Ellison  wrote:
> 
> People stash all sorts of things in (immutable) Strings. Reducing the
> limits in JDK9 seems like a regression.  Was there any consideration to
> using the older Java 8 StringCoding APIs for UTF-16 strings (already
> highly perf tuned) and adding additional methods for compact strings
> rather than rewriting everything as byte[]'s?

It doesn't help now, but https://bugs.openjdk.java.net/browse/JDK-8161256
proposes a better way to stash immutable bits, CONSTANT_Data.
(Caveat:  Language bindings not yet included.)  Eventually we'll get there.

— John

Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Xueming Shen

On 9/6/16, 2:18 PM, Tim Ellison wrote:



Do we have a real use case that impacted by this change?

People stash all sorts of things in (immutable) Strings. Reducing the
limits in JDK9 seems like a regression.  Was there any consideration to
using the older Java 8 StringCoding APIs for UTF-16 strings (already
highly perf tuned) and adding additional methods for compact strings
rather than rewriting everything as byte[]'s?




Hi Tim,

I'm sorry I don't get the idea of "using StringCoding APIs for UTF-16 
strings",
can you explain a little more in detail? We did try various approaches, 
byte[] +
flag, byte[] + coder, coder, char[] + coder, etc) the current one 
appears to be

the best so far based on our measurement.

Regards,
Sherman



Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Tim Ellison
On 06/09/16 19:04, Xueming Shen wrote:
> On 9/6/16, 10:09 AM, Tim Ellison wrote:
>> Has it been noted that while JEP 254 reduces the space occupied by one
>> byte per character strings, moving from a char[] to byte[]
>> representation universally means that the maximum length of a UTF-16
>> (two bytes per char) string is now halved?

Hey Sherman,

> Yes, it's a known "limit" given the nature of the approach. It is
> not considered to be an "incompatible change", because the max length
> the String class and the corresponding buffer/builder classes can
> support is really an implementation details, not a spec requirement.

Don't confuse spec compliance with compatibility.  Of course, the JEP
should not break the formal specified behavior of String etc, but the
goal was to ensure that the implementation be compatible with prior
behavior. As you know, there are many places where compatible behavior
beyond the spec is important to maintain.

> The conclusion from the discussion back then was this is something we
> can trade off for the benefits we gain from the approach. 

Out of curiosity, where was that?  I did search for previous discussion
of this topic but didn't see it -- it may be just my poor search foo.

> Do we have a real use case that impacted by this change?

People stash all sorts of things in (immutable) Strings. Reducing the
limits in JDK9 seems like a regression.  Was there any consideration to
using the older Java 8 StringCoding APIs for UTF-16 strings (already
highly perf tuned) and adding additional methods for compact strings
rather than rewriting everything as byte[]'s?

Regards,
Tim

>> Since the goal is "preserving full compatibility", this has been missed
>> by failing to allow for UTF-16 strings of length greater than
>> Integer.MAX_VALUE / 2.
>>
>> Regards,
>> Tim
>>
>>
> 


Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Xueming Shen

On 9/6/16, 12:58 PM, Charles Oliver Nutter wrote:
On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen > wrote:


Yes, it's a known "limit" given the nature of the approach. It is
not considered
to be an "incompatible change",  because the max length the String
class and
the corresponding buffer/builder classes can support is really an
implementation
details, not a spec requirement. The conclusion from the
discussion back then
was this is  something we can trade off for the benefits we gain
from the approach.
Do we have a real use case that impacted by this change?

Well, doesn't this mean that any code out there consuming String data 
that's longer than Integer.MAX_VALUE / 2 will suddenly start failing 
on OpenJDK 9?


Yes, true. But arguably the code that uses huge length of String should have
fallback code to handle the potential OOM exception, when the vm can't 
handle
the size, as there is really no guarantee the vm can handle the > 
max_value/2

length of String.


Not that such a case is a particularly good pattern, but I'm sure 
there's code out there doing it. On JRuby we routinely get bug reports 
complaining that we can't support strings larger than 2GB (and we have 
used byte[] for strings since 2006).



That was a trade-off decision to make.

Does JRuby have any better solution for such complain?  ever consider to 
go back to use char[]
to "fix" the problem? or some workaround such as to add another byte[] 
for example.


btw, the single byte only string should work just fine :-) or :-( 
depends on the character set

used.

Sherman


Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread John Rose
On Sep 6, 2016, at 12:58 PM, Charles Oliver Nutter  wrote:
> 
> On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen 
> wrote:
> 
>> Yes, it's a known "limit" given the nature of the approach. It is not
>> considered
>> to be an "incompatible change",  because the max length the String class
>> and
>> the corresponding buffer/builder classes can support is really an
>> implementation
>> details, not a spec requirement. The conclusion from the discussion back
>> then
>> was this is  something we can trade off for the benefits we gain from the
>> approach.
>> Do we have a real use case that impacted by this change?
>> 
> 
> Well, doesn't this mean that any code out there consuming String data
> that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on
> OpenJDK 9?
> 
> Not that such a case is a particularly good pattern, but I'm sure there's
> code out there doing it. On JRuby we routinely get bug reports complaining
> that we can't support strings larger than 2GB (and we have used byte[] for
> strings since 2006).
> 
> - Charlie

The most basic scale requirement for strings is that they support class-file
constants, which top out at a UTF8-length of 2**16.  Lengths beyond that,
to fill up the 'int' return value of String::length, are less well specified.

FTR, we could have chosen char[], int[], or long[] (not byte[]) as the backing
store for string data.  With long[] we could have strings above 4G-chars.

But it would have come with a perf. tax, since the T[].length field would need
to be combined with an extra bit or two (from a flag byte) to complete the 
length.
That's 2-3 extra instructions for loading a string length, or else a redundant
length field.  So it's a trade-off.

Likewise, choosing a third format deepens branch depth in order to get to 
payload.

Likewise, making the second format (of two) have a length field embedded in the
payload section requires a conditional load or branch, in order to load the 
string
length.  Again, more instructions.

The team has looked at 20 possibilities like these.  The current design is 
fastest.
I hope it flies.

— John

Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Charles Oliver Nutter
On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen 
wrote:

> Yes, it's a known "limit" given the nature of the approach. It is not
> considered
> to be an "incompatible change",  because the max length the String class
> and
> the corresponding buffer/builder classes can support is really an
> implementation
> details, not a spec requirement. The conclusion from the discussion back
> then
> was this is  something we can trade off for the benefits we gain from the
> approach.
> Do we have a real use case that impacted by this change?
>

Well, doesn't this mean that any code out there consuming String data
that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on
OpenJDK 9?

Not that such a case is a particularly good pattern, but I'm sure there's
code out there doing it. On JRuby we routinely get bug reports complaining
that we can't support strings larger than 2GB (and we have used byte[] for
strings since 2006).

- Charlie


Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Xueming Shen

On 9/6/16, 10:09 AM, Tim Ellison wrote:

Has it been noted that while JEP 254 reduces the space occupied by one
byte per character strings, moving from a char[] to byte[]
representation universally means that the maximum length of a UTF-16
(two bytes per char) string is now halved?

Hi Tim,

Yes, it's a known "limit" given the nature of the approach. It is not 
considered

to be an "incompatible change",  because the max length the String class and
the corresponding buffer/builder classes can support is really an 
implementation
details, not a spec requirement. The conclusion from the discussion back 
then
was this is  something we can trade off for the benefits we gain from 
the approach.

Do we have a real use case that impacted by this change?

Thanks,
Sherman


Since the goal is "preserving full compatibility", this has been missed
by failing to allow for UTF-16 strings of length greater than
Integer.MAX_VALUE / 2.

Regards,
Tim