Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) [v3]
On Wed, 13 Jan 2021 13:48:40 GMT, Claes Redestad wrote: >> Сергей Цыпанов has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains five additional >> commits since the last revision: >> >> - Merge branch 'master' into enc >> - 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) - >> small JavaDoc fix >> - Merge branch 'master' into enc >> - Merge branch 'master' into enc >> - Improve URLEncoder.encode(String, Charset) > > Looks good. > > I wonder... `CharArrayWriter` is an old and synchronized data structure, and > since the instance used here isn't shared that synchronization seem useless. > And since you're now bypassing the `char[]` and going straight for a `String` > you might get better performance with a `StringBuilder` here? (`setLength(0)` > instead of `reset()`...) @cl4es done - PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) [v3]
On Thu, 14 Jan 2021 12:48:18 GMT, Сергей Цыпанов wrote: >> Instead of allocating a copy of underlying array via >> `CharArrayWriter.toCharArray()` and passing it to constructor of String >> String str = new String(charArrayWriter.toCharArray()); >> we could call `toString()` method >> String str = charArrayWriter.toString(); >> decoding existing char[] without making a copy. This slightly speeds up the >> method reducing at the same time memory consumption for decoding URLs with >> non-latin symbols: >> @State(Scope.Thread) >> @BenchmarkMode(Mode.AverageTime) >> @OutputTimeUnit(TimeUnit.NANOSECONDS) >> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) >> public class UrlEncoderBenchmark { >> private static final Charset charset = Charset.defaultCharset(); >> private static final String utf8Url = >> "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN >> >> @Benchmark >> public String encodeUtf8() { >> return URLEncoder.encode(utf8Url, charset); >> } >> } >> The benchmark on my maching give the following output: >> before >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1166.378 ± 8.411 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 932.944 ± 6.393 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1712.193 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 929.221 ± 24.268 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1705.444 ± 43.235B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.011 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 652.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 334.000 ms >> >> after >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1058.851 ± 6.006 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 931.489 ± 5.182 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1552.176 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 933.491 ± 24.164 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1555.488 ± 39.204B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.010 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 655.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 333.000 ms > > Сергей Цыпанов has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains five additional > commits since the last revision: > > - Merge branch 'master' into enc > - 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) - > small JavaDoc fix > - Merge branch 'master' into enc > - Merge branch 'master' into enc > - Improve URLEncoder.encode(String, Charset) Marked as reviewed by chegar (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) [v3]
On Thu, 14 Jan 2021 11:48:16 GMT, Claes Redestad wrote: >> @cl4es SB brings pessimization both for time and memory, try >> `org.openjdk.bench.java.net.URLEncodeDecode`: >> master >> (count) (maxLength) >> (mySeed) Mode Cnt Score Error Units >> testEncodeUTF8 1024 1024 >> 3 avgt 25 8.573 ? 0.023 ms/op >> testEncodeUTF8:?gc.alloc.rate 1024 1024 >> 3 avgt 25 1202.896 ? 3.225 MB/sec >> testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 >> 3 avgt 25 11355727.904 ? 196.249B/op >> testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 >> 3 avgt 25 1203.785 ? 6.240 MB/sec >> testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 >> 3 avgt 25 11364143.637 ? 52830.222B/op >> testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 >> 3 avgt 25 0.008 ? 0.001 MB/sec >> testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 >> 3 avgt 2577.088 ? 9.303B/op >> testEncodeUTF8:?gc.count1024 1024 >> 3 avgt 25 1973.000 counts >> testEncodeUTF8:?gc.time 1024 1024 >> 3 avgt 25 996.000 ms >> >> enc >> (count) (maxLength) >> (mySeed) Mode CntScore Error Units >> testEncodeUTF8 1024 1024 >> 3 avgt 257.931 ? 0.006 ms/op >> testEncodeUTF8:?gc.alloc.rate 1024 1024 >> 3 avgt 25 965.347 ? 0.736 MB/sec >> testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 >> 3 avgt 25 8430590.163 ? 7.213B/op >> testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 >> 3 avgt 25 966.373 ? 5.248 MB/sec >> testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 >> 3 avgt 25 8439563.689 ? 47282.178B/op >> testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 >> 3 avgt 250.007 ? 0.001 MB/sec >> testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 >> 3 avgt 25 60.949 ? 8.405B/op >> testEncodeUTF8:?gc.count1024 1024 >> 3 avgt 25 1715.000 counts >> testEncodeUTF8:?gc.time 1024 1024 >> 3 avgt 25 888.000 ms >> >> stringBuilder >> (count) (maxLength) >> (mySeed) Mode Cnt Score Error Units >> testEncodeUTF8 1024 1024 >> 3 avgt 25 8.115 ? 0.110 ms/op >> testEncodeUTF8:?gc.alloc.rate 1024 1024 >> 3 avgt 25 1259.267 ?16.716 MB/sec >> testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 >> 3 avgt 25 11249391.875 ? 6.552B/op >> testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 >> 3 avgt 25 1259.937 ?17.232 MB/sec >> testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 >> 3 avgt 25 11255413.875 ? 43636.143B/op >> testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 >> 3 avgt 25 0.007 ? 0.001 MB/sec >> testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 >> 3 avgt 2559.461 ? 9.087B/op >> testEncodeUTF8:?gc.count1024 1024 >> 3 avgt 25 2236.000 counts >> testEncodeUTF8:?gc.time 1024 1024 >> 3 avgt 25 1089.000 ms >> The reason seems to be single char `StringBuilder.append()` that apart from >> range check does encoding check and stores `char` as two bytes in `byte[]` >> in ASB > > Surprising, but thanks for checking! No need to merge in changes in master unless there are conflicts or you want to do more changes of your own, they will be merged automatically on integration. But it seems the bots need you to do /integrate again - PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) [v3]
> Instead of allocating a copy of underlying array via > `CharArrayWriter.toCharArray()` and passing it to constructor of String > String str = new String(charArrayWriter.toCharArray()); > we could call `toString()` method > String str = charArrayWriter.toString(); > decoding existing char[] without making a copy. This slightly speeds up the > method reducing at the same time memory consumption for decoding URLs with > non-latin symbols: > @State(Scope.Thread) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class UrlEncoderBenchmark { > private static final Charset charset = Charset.defaultCharset(); > private static final String utf8Url = > "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN > > @Benchmark > public String encodeUtf8() { > return URLEncoder.encode(utf8Url, charset); > } > } > The benchmark on my maching give the following output: > before > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1166.378 ± 8.411 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 932.944 ± 6.393 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1712.193 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 929.221 ± 24.268 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1705.444 ± 43.235B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.011 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 652.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 334.000 ms > > after > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1058.851 ± 6.006 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 931.489 ± 5.182 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1552.176 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 933.491 ± 24.164 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1555.488 ± 39.204B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.010 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 655.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 333.000 ms Сергей Цыпанов has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into enc - 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) - small JavaDoc fix - Merge branch 'master' into enc - Merge branch 'master' into enc - Improve URLEncoder.encode(String, Charset) - Changes: - all: https://git.openjdk.java.net/jdk/pull/1598/files - new: https://git.openjdk.java.net/jdk/pull/1598/files/ae293c8b..2183af4c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1598&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1598&range=01-02 Stats: 3325 lines in 139 files changed: 1985 ins; 870 del; 470 mod Patch: https://git.openjdk.java.net/jdk/pull/1598.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1598/head:pull/1598 PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) [v2]
On Thu, 14 Jan 2021 09:31:19 GMT, Сергей Цыпанов wrote: >> Looks good. >> >> I wonder... `CharArrayWriter` is an old and synchronized data structure, and >> since the instance used here isn't shared that synchronization seem useless. >> And since you're now bypassing the `char[]` and going straight for a >> `String` you might get better performance with a `StringBuilder` here? >> (`setLength(0)` instead of `reset()`...) > > @cl4es SB brings pessimization both for time and memory, try > `org.openjdk.bench.java.net.URLEncodeDecode`: > master > (count) (maxLength) > (mySeed) Mode Cnt Score Error Units > testEncodeUTF8 1024 1024 > 3 avgt 25 8.573 ? 0.023 ms/op > testEncodeUTF8:?gc.alloc.rate 1024 1024 > 3 avgt 25 1202.896 ? 3.225 MB/sec > testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 > 3 avgt 25 11355727.904 ? 196.249B/op > testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 > 3 avgt 25 1203.785 ? 6.240 MB/sec > testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 > 3 avgt 25 11364143.637 ? 52830.222B/op > testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 > 3 avgt 25 0.008 ? 0.001 MB/sec > testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 > 3 avgt 2577.088 ? 9.303B/op > testEncodeUTF8:?gc.count1024 1024 > 3 avgt 25 1973.000 counts > testEncodeUTF8:?gc.time 1024 1024 > 3 avgt 25 996.000 ms > > enc > (count) (maxLength) > (mySeed) Mode CntScore Error Units > testEncodeUTF8 1024 1024 > 3 avgt 257.931 ? 0.006 ms/op > testEncodeUTF8:?gc.alloc.rate 1024 1024 > 3 avgt 25 965.347 ? 0.736 MB/sec > testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 > 3 avgt 25 8430590.163 ? 7.213B/op > testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 > 3 avgt 25 966.373 ? 5.248 MB/sec > testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 > 3 avgt 25 8439563.689 ? 47282.178B/op > testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 > 3 avgt 250.007 ? 0.001 MB/sec > testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 > 3 avgt 25 60.949 ? 8.405B/op > testEncodeUTF8:?gc.count1024 1024 > 3 avgt 25 1715.000 counts > testEncodeUTF8:?gc.time 1024 1024 > 3 avgt 25 888.000 ms > > stringBuilder > (count) (maxLength) > (mySeed) Mode Cnt Score Error Units > testEncodeUTF8 1024 1024 > 3 avgt 25 8.115 ? 0.110 ms/op > testEncodeUTF8:?gc.alloc.rate 1024 1024 > 3 avgt 25 1259.267 ?16.716 MB/sec > testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 > 3 avgt 25 11249391.875 ? 6.552B/op > testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 > 3 avgt 25 1259.937 ?17.232 MB/sec > testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 > 3 avgt 25 11255413.875 ? 43636.143B/op > testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 > 3 avgt 25 0.007 ? 0.001 MB/sec > testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 > 3 avgt 2559.461 ? 9.087B/op > testEncodeUTF8:?gc.count1024 1024 > 3 avgt 25 2236.000 counts > testEncodeUTF8:?gc.time 1024 1024 > 3 avgt 25 1089.000 ms > The reason seems to be single char `StringBuilder.append()` that apart from > range check does encoding check and stores `char` as two bytes in `byte[]` in > ASB Surprising, but thanks for checking! - PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) [v2]
> Instead of allocating a copy of underlying array via > `CharArrayWriter.toCharArray()` and passing it to constructor of String > String str = new String(charArrayWriter.toCharArray()); > we could call `toString()` method > String str = charArrayWriter.toString(); > decoding existing char[] without making a copy. This slightly speeds up the > method reducing at the same time memory consumption for decoding URLs with > non-latin symbols: > @State(Scope.Thread) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class UrlEncoderBenchmark { > private static final Charset charset = Charset.defaultCharset(); > private static final String utf8Url = > "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN > > @Benchmark > public String encodeUtf8() { > return URLEncoder.encode(utf8Url, charset); > } > } > The benchmark on my maching give the following output: > before > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1166.378 ± 8.411 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 932.944 ± 6.393 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1712.193 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 929.221 ± 24.268 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1705.444 ± 43.235B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.011 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 652.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 334.000 ms > > after > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1058.851 ± 6.006 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 931.489 ± 5.182 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1552.176 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 933.491 ± 24.164 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1555.488 ± 39.204B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.010 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 655.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 333.000 ms Сергей Цыпанов has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset) - small JavaDoc fix - Merge branch 'master' into enc - Merge branch 'master' into enc - Improve URLEncoder.encode(String, Charset) - Changes: - all: https://git.openjdk.java.net/jdk/pull/1598/files - new: https://git.openjdk.java.net/jdk/pull/1598/files/2856c923..ae293c8b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1598&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1598&range=00-01 Stats: 31439 lines in 1018 files changed: 11701 ins; 8302 del; 11436 mod Patch: https://git.openjdk.java.net/jdk/pull/1598.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1598/head:pull/1598 PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)
On Wed, 13 Jan 2021 13:48:40 GMT, Claes Redestad wrote: >> Instead of allocating a copy of underlying array via >> `CharArrayWriter.toCharArray()` and passing it to constructor of String >> String str = new String(charArrayWriter.toCharArray()); >> we could call `toString()` method >> String str = charArrayWriter.toString(); >> decoding existing char[] without making a copy. This slightly speeds up the >> method reducing at the same time memory consumption for decoding URLs with >> non-latin symbols: >> @State(Scope.Thread) >> @BenchmarkMode(Mode.AverageTime) >> @OutputTimeUnit(TimeUnit.NANOSECONDS) >> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) >> public class UrlEncoderBenchmark { >> private static final Charset charset = Charset.defaultCharset(); >> private static final String utf8Url = >> "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN >> >> @Benchmark >> public String encodeUtf8() { >> return URLEncoder.encode(utf8Url, charset); >> } >> } >> The benchmark on my maching give the following output: >> before >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1166.378 ± 8.411 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 932.944 ± 6.393 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1712.193 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 929.221 ± 24.268 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1705.444 ± 43.235B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.011 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 652.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 334.000 ms >> >> after >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1058.851 ± 6.006 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 931.489 ± 5.182 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1552.176 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 933.491 ± 24.164 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1555.488 ± 39.204B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.010 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 655.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 333.000 ms > > Looks good. > > I wonder... `CharArrayWriter` is an old and synchronized data structure, and > since the instance used here isn't shared that synchronization seem useless. > And since you're now bypassing the `char[]` and going straight for a `String` > you might get better performance with a `StringBuilder` here? (`setLength(0)` > instead of `reset()`...) @cl4es SB brings pessimization both for time and memory, try `org.openjdk.bench.java.net.URLEncodeDecode`: master (count) (maxLength) (mySeed) Mode Cnt Score Error Units testEncodeUTF8 1024 1024 3 avgt 25 8.573 ? 0.023 ms/op testEncodeUTF8:?gc.alloc.rate 1024 1024 3 avgt 25 1202.896 ? 3.225 MB/sec testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 3 avgt 25 11355727.904 ? 196.249B/op testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 3 avgt 25 1203.785 ? 6.240 MB/sec testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 3 avgt 25 11364143.637 ? 52830.222B/op testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 3 avgt 25 0.008 ? 0.001 MB/sec testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 3 avgt 2577.088 ? 9.303B/op testEncodeUTF8:?gc.count1024 1024 3 avgt 25 1973.000 counts testEncodeUTF8:?gc.time 1024 1024 3 avgt 25 996.000
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)
On Wed, 13 Jan 2021 13:48:40 GMT, Claes Redestad wrote: >> Instead of allocating a copy of underlying array via >> `CharArrayWriter.toCharArray()` and passing it to constructor of String >> String str = new String(charArrayWriter.toCharArray()); >> we could call `toString()` method >> String str = charArrayWriter.toString(); >> decoding existing char[] without making a copy. This slightly speeds up the >> method reducing at the same time memory consumption for decoding URLs with >> non-latin symbols: >> @State(Scope.Thread) >> @BenchmarkMode(Mode.AverageTime) >> @OutputTimeUnit(TimeUnit.NANOSECONDS) >> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) >> public class UrlEncoderBenchmark { >> private static final Charset charset = Charset.defaultCharset(); >> private static final String utf8Url = >> "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN >> >> @Benchmark >> public String encodeUtf8() { >> return URLEncoder.encode(utf8Url, charset); >> } >> } >> The benchmark on my maching give the following output: >> before >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1166.378 ± 8.411 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 932.944 ± 6.393 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1712.193 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 929.221 ± 24.268 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1705.444 ± 43.235B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.011 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 652.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 334.000 ms >> >> after >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1058.851 ± 6.006 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 931.489 ± 5.182 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1552.176 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 933.491 ± 24.164 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1555.488 ± 39.204B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.010 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 655.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 333.000 ms > > Looks good. > > I wonder... `CharArrayWriter` is an old and synchronized data structure, and > since the instance used here isn't shared that synchronization seem useless. > And since you're now bypassing the `char[]` and going straight for a `String` > you might get better performance with a `StringBuilder` here? (`setLength(0)` > instead of `reset()`...) @cl4es hi, let me try `StringBuilder` - PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)
On Thu, 3 Dec 2020 14:29:58 GMT, Сергей Цыпанов wrote: > Instead of allocating a copy of underlying array via > `CharArrayWriter.toCharArray()` and passing it to constructor of String > String str = new String(charArrayWriter.toCharArray()); > we could call `toString()` method > String str = charArrayWriter.toString(); > decoding existing char[] without making a copy. This slightly speeds up the > method reducing at the same time memory consumption for decoding URLs with > non-latin symbols: > @State(Scope.Thread) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class UrlEncoderBenchmark { > private static final Charset charset = Charset.defaultCharset(); > private static final String utf8Url = > "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN > > @Benchmark > public String encodeUtf8() { > return URLEncoder.encode(utf8Url, charset); > } > } > The benchmark on my maching give the following output: > before > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1166.378 ± 8.411 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 932.944 ± 6.393 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1712.193 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 929.221 ± 24.268 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1705.444 ± 43.235B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.011 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 652.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 334.000 ms > > after > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1058.851 ± 6.006 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 931.489 ± 5.182 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1552.176 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 933.491 ± 24.164 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1555.488 ± 39.204B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.010 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 655.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 333.000 ms Looks good. I wonder... `CharArrayWriter` is an old and synchronized data structure, and since the instance used here isn't shared that synchronization seem useless. And since you're now bypassing the `char[]` and going straight for a `String` you might get better performance with a `StringBuilder` here? (`setLength(0)` instead of `reset()`...) - Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)
On Thu, 3 Dec 2020 14:29:58 GMT, Сергей Цыпанов wrote: > Instead of allocating a copy of underlying array via > `CharArrayWriter.toCharArray()` and passing it to constructor of String > String str = new String(charArrayWriter.toCharArray()); > we could call `toString()` method > String str = charArrayWriter.toString(); > decoding existing char[] without making a copy. This slightly speeds up the > method reducing at the same time memory consumption for decoding URLs with > non-latin symbols: > @State(Scope.Thread) > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) > public class UrlEncoderBenchmark { > private static final Charset charset = Charset.defaultCharset(); > private static final String utf8Url = > "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN > > @Benchmark > public String encodeUtf8() { > return URLEncoder.encode(utf8Url, charset); > } > } > The benchmark on my maching give the following output: > before > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1166.378 ± 8.411 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 932.944 ± 6.393 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1712.193 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 929.221 ± 24.268 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1705.444 ± 43.235B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.011 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 652.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 334.000 ms > > after > BenchmarkMode Cnt > ScoreError Units > UrlEncoderBenchmark.encodeUtf8 avgt 100 > 1058.851 ± 6.006 ns/op > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 > 931.489 ± 5.182 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 > 1552.176 ± 0.005B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 > 933.491 ± 24.164 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 > 1555.488 ± 39.204B/op > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 > 0.006 ± 0.001 MB/sec > UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 > 0.010 ± 0.002B/op > UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 > 655.000 counts > UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 > 333.000 ms Looks good! - Marked as reviewed by attila (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)
On Mon, 11 Jan 2021 09:28:46 GMT, Сергей Цыпанов wrote: >> Looks good! > > @szegedi could you please create a ticket for this change? Otherwise I cannot > merge it Filed a ticket, you should rename this PR to "8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)" - PR: https://git.openjdk.java.net/jdk/pull/1598
Re: RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)
On Sun, 10 Jan 2021 20:13:51 GMT, Attila Szegedi wrote: >> Instead of allocating a copy of underlying array via >> `CharArrayWriter.toCharArray()` and passing it to constructor of String >> String str = new String(charArrayWriter.toCharArray()); >> we could call `toString()` method >> String str = charArrayWriter.toString(); >> decoding existing char[] without making a copy. This slightly speeds up the >> method reducing at the same time memory consumption for decoding URLs with >> non-latin symbols: >> @State(Scope.Thread) >> @BenchmarkMode(Mode.AverageTime) >> @OutputTimeUnit(TimeUnit.NANOSECONDS) >> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) >> public class UrlEncoderBenchmark { >> private static final Charset charset = Charset.defaultCharset(); >> private static final String utf8Url = >> "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN >> >> @Benchmark >> public String encodeUtf8() { >> return URLEncoder.encode(utf8Url, charset); >> } >> } >> The benchmark on my maching give the following output: >> before >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1166.378 ± 8.411 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 932.944 ± 6.393 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1712.193 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 929.221 ± 24.268 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1705.444 ± 43.235B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.011 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 652.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 334.000 ms >> >> after >> BenchmarkMode Cnt >> ScoreError Units >> UrlEncoderBenchmark.encodeUtf8 avgt 100 >> 1058.851 ± 6.006 ns/op >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 >> 931.489 ± 5.182 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 >> 1552.176 ± 0.005B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 >> 933.491 ± 24.164 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 >> 1555.488 ± 39.204B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 >> 0.006 ± 0.001 MB/sec >> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 >> 0.010 ± 0.002B/op >> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 >> 655.000 counts >> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 >> 333.000 ms > > Looks good! @szegedi could you please create a ticket for this change? Otherwise I cannot merge it - PR: https://git.openjdk.java.net/jdk/pull/1598
RFR: 8259699: Reduce char[] copying in URLEncoder.encode(String, Charset)
Instead of allocating a copy of underlying array via `CharArrayWriter.toCharArray()` and passing it to constructor of String String str = new String(charArrayWriter.toCharArray()); we could call `toString()` method String str = charArrayWriter.toString(); decoding existing char[] without making a copy. This slightly speeds up the method reducing at the same time memory consumption for decoding URLs with non-latin symbols: @State(Scope.Thread) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) public class UrlEncoderBenchmark { private static final Charset charset = Charset.defaultCharset(); private static final String utf8Url = "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций";; // UN @Benchmark public String encodeUtf8() { return URLEncoder.encode(utf8Url, charset); } } The benchmark on my maching give the following output: before BenchmarkMode Cnt ScoreError Units UrlEncoderBenchmark.encodeUtf8 avgt 100 1166.378 ± 8.411 ns/op UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 932.944 ± 6.393 MB/sec UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1712.193 ± 0.005B/op UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 929.221 ± 24.268 MB/sec UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 1705.444 ± 43.235B/op UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 0.006 ± 0.001 MB/sec UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 0.011 ± 0.002B/op UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 652.000 counts UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 334.000 ms after BenchmarkMode Cnt ScoreError Units UrlEncoderBenchmark.encodeUtf8 avgt 100 1058.851 ± 6.006 ns/op UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rateavgt 100 931.489 ± 5.182 MB/sec UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100 1552.176 ± 0.005B/op UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100 933.491 ± 24.164 MB/sec UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100 1555.488 ± 39.204B/op UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100 0.006 ± 0.001 MB/sec UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100 0.010 ± 0.002B/op UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100 655.000 counts UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100 333.000 ms - Commit messages: - Merge branch 'master' into enc - Improve URLEncoder.encode(String, Charset) Changes: https://git.openjdk.java.net/jdk/pull/1598/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1598&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259699 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/1598.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1598/head:pull/1598 PR: https://git.openjdk.java.net/jdk/pull/1598