Hi Tagir,

On 2020-06-15 05:57, Tagir Valeev wrote:
Hello, Peter and Claes!

Cache locality might suffer a bit down the line since
String and byte[] becomes less likely to reside in the same slice
of heap memory.

There's a JEP 192 String Deduplication in G1 that does similar
optimization inside the GC (currently it's off by default).
The 'Risk and assumptions' section reads as:

Normally a String object and its corresponding character array will be placed 
next to each other in memory, leading to good cache locality. After 
deduplicating a String object its character array will be further away, which 
might introduce a slight performance overhead. Initial measurements have, 
however, shown that this does not seem to be a problem. In fact, after 
deduplication the total amount of accessed memory is reduced which tends to 
result in improved cache hit rates.

So I believe it would not be a big problem in our case either.

The moral difference is that String deduplication is an opt-in feature,
so users can decide if the trade-offs inherent is right for them. Here
we are making that choice for users, albeit for a very small subset of
all possible Strings. This saves a lot of cycles and memory up front,
which I think weighs in favor here (many Strings are short-lived).


You could simplify as "return new String(s{2,1});".

Thanks! Totally forgot about that String constructor.

Peter,
Also, for cases where one argument evaluates to empty string and the
other is not a String, I think you could simply return the seconds
string (not creating another instance with reused value/coder)\

I believe, this may still violate the spec. The resulting object must
be a 'newly created' while the custom toString() may return cached or
constant string. So we still need to copy it. This also applies to the
Claes' unaryConcat:

+    @ForceInline
+    static String unaryConcat(Object first) {
+        if (first instanceof String) {
+            // Must produce a new String
+            return new String((String)first);
+        } else {
+            return stringOf(first);
+        }
+    }

It should be just
// Must produce a new String
return new String(stringOf(first));

True.


So should I file a separate JBS issue for my optimization or both
cases (constant empty string and non-constant empty string) could be
handled under a single issue?

Up to you. I can fold the patches into one or do them separately, with
my unaryConcat as a follow-up.

/Claes


With best regards,
Tagir Valeev.

On Mon, Jun 15, 2020 at 4:04 AM Peter Levart <peter.lev...@gmail.com> wrote:

Hi Tagir,


I think that there might be cases where one of the arguments of
concatenation is constant empty String. I found myself sometimes doing
"" + someNonStringValue (instead of more readable
String.valueOf(someNonStringValue)) simply because of laziness. So does
this optimization help in that case as much as with non-constant "" +
"longlonglongline" ?


Also, for cases where one argument evaluates to empty string and the
other is not a String, I think you could simply return the seconds
string (not creating another instance with reused value/coder):


if (s1.isEmpty()) {

    return s2 == second ? new String(s2.value(), s2.coder()) : s2;

}


Regards, Peter


On 6/13/20 7:08 AM, Tagir Valeev wrote:
Hello!

It's quite possible that when we concatenate two strings, one of them
appears to be empty. We cannot simply return another string in this
case, as JLS 15.18.1 explicitly says (for unknown to me reason) about
the result of the string concatenation expression that 'The String
object is newly created'. However, it's still possible to reuse the
internal array of another string to reduce allocations:

--- src/java.base/share/classes/java/lang/StringConcatHelper.java
(revision 58998:04e3d254c76be87788a40cbd66d013140ea951d8)
+++ src/java.base/share/classes/java/lang/StringConcatHelper.java
(revision 58998+:04e3d254c76b+)
@@ -420,6 +420,12 @@
       static String simpleConcat(Object first, Object second) {
           String s1 = stringOf(first);
           String s2 = stringOf(second);
+        if (s1.isEmpty()) {
+            return new String(s2.value(), s2.coder());
+        }
+        if (s2.isEmpty()) {
+            return new String(s1.value(), s1.coder());
+        }
           // start "mixing" in length and coder or arguments, order is not
           // important
           long indexCoder = mix(initialCoder(), s2);

Very simple benchmark like this validates that the concatenation
became faster if one of the strings is empty:

    @Param({"", "longlonglongline"})
    String data;

    @Param({"", "longlonglongline"})
    String data2;

    @Benchmark
    public String plus() {
      return data + data2;
    }

Without patch I observe on VM 15-ea+20-899:

Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
plus                                      avgt   30  15,335 ± 0,186  ns/op
plus                    longlonglongline  avgt   30  19,867 ± 0,109  ns/op
plus  longlonglongline                    avgt   30  20,283 ± 0,230  ns/op
plus  longlonglongline  longlonglongline  avgt   30  26,047 ± 0,230  ns/op

With patch:
Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
plus                                      avgt   30   6,668 ± 0,055  ns/op
plus                    longlonglongline  avgt   30   6,708 ± 0,114  ns/op
plus  longlonglongline                    avgt   30   7,003 ± 0,064  ns/op
plus  longlonglongline  longlonglongline  avgt   30  25,126 ± 0,392  ns/op

There could be an added cost of up to two branches for the normal case
(I believe, if one of the strings is constant, then decent JIT can
eliminate one of the branches). However, I believe, the benefit could
outweigh it, as empty strings are not that uncommon and in this case,
we reduce O(N) time and memory consumption to O(1).

What do you think? Is this a reasonable thing to do? I can file an
issue and submit a proper webrev if it looks like a useful patch.

With best regards,
Tagir Valeev.

Reply via email to