Re: [11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-19 Thread Daniel Jeliński
Great, thanks!
I tried to follow the instructions here:
https://wiki.openjdk.java.net/display/JDKUpdates/How+to+contribute+a+fix
The hg export recipe was not available, so I substituted git format-patch.
I wasn't aware of git hg-export. I think it should be mentioned on that
wiki.
Thanks again,
Daniel

pt., 19 mar 2021, 16:21 użytkownik Hohensee, Paul 
napisał:

> Pushed. In the future, it'd be great if you used "git hg-export" (one of
> the Skara tools) to generate the base changeset, because it includes the
> commit metadata we want to preserve in the 11u hg repo.
>
> Thanks,
> Paul
>
> -Original Message-
> From: security-dev  on behalf of
> "Hohensee, Paul" 
> Date: Wednesday, March 17, 2021 at 1:01 PM
> To: Daniel Jeliński 
> Cc: "jdk-updates-...@openjdk.java.net" ,
> "security-dev@openjdk.java.net" 
> Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and
> scalability
>
> Np, tagged.
>
> -Original Message-
> From: Daniel Jeliński 
> Date: Tuesday, March 16, 2021 at 11:40 PM
> To: "Hohensee, Paul" 
> Cc: "jdk-updates-...@openjdk.java.net" ,
> "security-dev@openjdk.java.net" 
> Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and
> scalability
>
> Thanks again Paul.
>
> Could you sponsor the change? As far as I can tell, I'd need to add a
> fix request now, but I don't have access to issue tracker.
> Thanks,
> Daniel
>
> wt., 16 mar 2021 o 18:59 Hohensee, Paul  napisał(a):
> >
> > Looks good! :)
> >
> > -Original Message-
> > From: Daniel Jeliński 
> > Date: Tuesday, March 16, 2021 at 9:49 AM
> > To: "Hohensee, Paul" 
> > Cc: "jdk-updates-...@openjdk.java.net" ,
> "security-dev@openjdk.java.net" 
> > Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance
> and scalability
> >
> > Thanks Paul for your review and for the hint.
> >
> > Updated webrev:
> https://djelinski.github.io/8259886-11u/webrev2/index.html
> >
> > compared to original, changes to make/test/BuildMicrobenchmark.gmk
> > were dropped because file does not exist in jdk11
> > compared to previous webrev, CacheBench was re-added.
> >
> > Testing: Linux x86_64 tier1 and tier2.
> > Thanks,
> > Daniel
> >
> > pon., 15 mar 2021 o 18:09 Hohensee, Paul 
> napisał(a):
> > >
> > > The changes to Cache.java look fine, but please include
> CacheBench.java. I'd like to see 11u to stand on its own without reference
> to later releases, plus I believe the 11u maintainers prefer to backport as
> much of a patch as possible.
> > >
> > > Thanks,
> > > Paul
> > >
> > > -Original Message-
> > > From: security-dev  on behalf of
> Daniel Jeliński 
> > > Date: Tuesday, March 9, 2021 at 3:37 PM
> > > To: "jdk-updates-...@openjdk.java.net" <
> jdk-updates-...@openjdk.java.net>
> > > Cc: "security-dev@openjdk.java.net" 
> > > Subject: [11u] RFR 8259886: Improve SSL session cache performance and
> scalability
> > >
> > > Hi,
> > > Please review this 11u backport; this is the same patch as for head,
> > > except for microbenchmark makefile changes that did not apply because
> > > the file doesn't exist in 11u, and the actual microbenchmark, which
> > > would only add weight for no benefit.
> > >
> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
> > > webrev: https://djelinski.github.io/8259886-11u/webrev/index.html
> > >
> > > Testing: Linux x86_64 tier1 and tier2.
> > >
> > > Thanks,
> > > Daniel
> > >
> >
>
>
>


Re: [11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-19 Thread Hohensee, Paul
Pushed. In the future, it'd be great if you used "git hg-export" (one of the 
Skara tools) to generate the base changeset, because it includes the commit 
metadata we want to preserve in the 11u hg repo.

Thanks,
Paul

-Original Message-
From: security-dev  on behalf of "Hohensee, 
Paul" 
Date: Wednesday, March 17, 2021 at 1:01 PM
To: Daniel Jeliński 
Cc: "jdk-updates-...@openjdk.java.net" , 
"security-dev@openjdk.java.net" 
Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and 
scalability

Np, tagged.

-Original Message-
From: Daniel Jeliński 
Date: Tuesday, March 16, 2021 at 11:40 PM
To: "Hohensee, Paul" 
Cc: "jdk-updates-...@openjdk.java.net" , 
"security-dev@openjdk.java.net" 
Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and 
scalability

Thanks again Paul.

Could you sponsor the change? As far as I can tell, I'd need to add a
fix request now, but I don't have access to issue tracker.
Thanks,
Daniel

wt., 16 mar 2021 o 18:59 Hohensee, Paul  napisał(a):
>
> Looks good! :)
>
> -Original Message-
> From: Daniel Jeliński 
> Date: Tuesday, March 16, 2021 at 9:49 AM
> To: "Hohensee, Paul" 
> Cc: "jdk-updates-...@openjdk.java.net" , 
> "security-dev@openjdk.java.net" 
> Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and 
> scalability
>
> Thanks Paul for your review and for the hint.
>
> Updated webrev: https://djelinski.github.io/8259886-11u/webrev2/index.html
>
> compared to original, changes to make/test/BuildMicrobenchmark.gmk
> were dropped because file does not exist in jdk11
> compared to previous webrev, CacheBench was re-added.
>
> Testing: Linux x86_64 tier1 and tier2.
> Thanks,
> Daniel
>
> pon., 15 mar 2021 o 18:09 Hohensee, Paul  napisał(a):
> >
> > The changes to Cache.java look fine, but please include CacheBench.java. 
> > I'd like to see 11u to stand on its own without reference to later 
> > releases, plus I believe the 11u maintainers prefer to backport as much of 
> > a patch as possible.
> >
> > Thanks,
> > Paul
> >
> > -----Original Message-
> > From: security-dev  on behalf of Daniel 
> > Jeliński 
> > Date: Tuesday, March 9, 2021 at 3:37 PM
> > To: "jdk-updates-...@openjdk.java.net" 
> > Cc: "security-dev@openjdk.java.net" 
> > Subject: [11u] RFR 8259886: Improve SSL session cache performance and 
> > scalability
> >
> > Hi,
> > Please review this 11u backport; this is the same patch as for head,
> > except for microbenchmark makefile changes that did not apply because
> > the file doesn't exist in 11u, and the actual microbenchmark, which
> > would only add weight for no benefit.
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
> > webrev: https://djelinski.github.io/8259886-11u/webrev/index.html
> >
> > Testing: Linux x86_64 tier1 and tier2.
> >
> > Thanks,
> > Daniel
> >
>




RE: [11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-17 Thread Hohensee, Paul
Np, tagged.

-Original Message-
From: Daniel Jeliński 
Date: Tuesday, March 16, 2021 at 11:40 PM
To: "Hohensee, Paul" 
Cc: "jdk-updates-...@openjdk.java.net" , 
"security-dev@openjdk.java.net" 
Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and 
scalability

Thanks again Paul.

Could you sponsor the change? As far as I can tell, I'd need to add a
fix request now, but I don't have access to issue tracker.
Thanks,
Daniel

wt., 16 mar 2021 o 18:59 Hohensee, Paul  napisał(a):
>
> Looks good! :)
>
> -Original Message-
> From: Daniel Jeliński 
> Date: Tuesday, March 16, 2021 at 9:49 AM
> To: "Hohensee, Paul" 
> Cc: "jdk-updates-...@openjdk.java.net" , 
> "security-dev@openjdk.java.net" 
> Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and 
> scalability
>
> Thanks Paul for your review and for the hint.
>
> Updated webrev: https://djelinski.github.io/8259886-11u/webrev2/index.html
>
> compared to original, changes to make/test/BuildMicrobenchmark.gmk
> were dropped because file does not exist in jdk11
> compared to previous webrev, CacheBench was re-added.
>
> Testing: Linux x86_64 tier1 and tier2.
> Thanks,
> Daniel
>
> pon., 15 mar 2021 o 18:09 Hohensee, Paul  napisał(a):
> >
> > The changes to Cache.java look fine, but please include CacheBench.java. 
> > I'd like to see 11u to stand on its own without reference to later 
> > releases, plus I believe the 11u maintainers prefer to backport as much of 
> > a patch as possible.
> >
> > Thanks,
> > Paul
> >
> > -Original Message-----
> > From: security-dev  on behalf of Daniel 
> > Jeliński 
> > Date: Tuesday, March 9, 2021 at 3:37 PM
> > To: "jdk-updates-...@openjdk.java.net" 
> > Cc: "security-dev@openjdk.java.net" 
> > Subject: [11u] RFR 8259886: Improve SSL session cache performance and 
> > scalability
> >
> > Hi,
> > Please review this 11u backport; this is the same patch as for head,
> > except for microbenchmark makefile changes that did not apply because
> > the file doesn't exist in 11u, and the actual microbenchmark, which
> > would only add weight for no benefit.
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
> > webrev: https://djelinski.github.io/8259886-11u/webrev/index.html
> >
> > Testing: Linux x86_64 tier1 and tier2.
> >
> > Thanks,
> > Daniel
> >
>



Re: [11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-16 Thread Daniel Jeliński
Thanks again Paul.

Could you sponsor the change? As far as I can tell, I'd need to add a
fix request now, but I don't have access to issue tracker.
Thanks,
Daniel

wt., 16 mar 2021 o 18:59 Hohensee, Paul  napisał(a):
>
> Looks good! :)
>
> -Original Message-
> From: Daniel Jeliński 
> Date: Tuesday, March 16, 2021 at 9:49 AM
> To: "Hohensee, Paul" 
> Cc: "jdk-updates-...@openjdk.java.net" , 
> "security-dev@openjdk.java.net" 
> Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and 
> scalability
>
> Thanks Paul for your review and for the hint.
>
> Updated webrev: https://djelinski.github.io/8259886-11u/webrev2/index.html
>
> compared to original, changes to make/test/BuildMicrobenchmark.gmk
> were dropped because file does not exist in jdk11
> compared to previous webrev, CacheBench was re-added.
>
> Testing: Linux x86_64 tier1 and tier2.
> Thanks,
> Daniel
>
> pon., 15 mar 2021 o 18:09 Hohensee, Paul  napisał(a):
> >
> > The changes to Cache.java look fine, but please include CacheBench.java. 
> > I'd like to see 11u to stand on its own without reference to later 
> > releases, plus I believe the 11u maintainers prefer to backport as much of 
> > a patch as possible.
> >
> > Thanks,
> > Paul
> >
> > -Original Message-
> > From: security-dev  on behalf of Daniel 
> > Jeliński 
> > Date: Tuesday, March 9, 2021 at 3:37 PM
> > To: "jdk-updates-...@openjdk.java.net" 
> > Cc: "security-dev@openjdk.java.net" 
> > Subject: [11u] RFR 8259886: Improve SSL session cache performance and 
> > scalability
> >
> > Hi,
> > Please review this 11u backport; this is the same patch as for head,
> > except for microbenchmark makefile changes that did not apply because
> > the file doesn't exist in 11u, and the actual microbenchmark, which
> > would only add weight for no benefit.
> >
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
> > webrev: https://djelinski.github.io/8259886-11u/webrev/index.html
> >
> > Testing: Linux x86_64 tier1 and tier2.
> >
> > Thanks,
> > Daniel
> >
>


RE: [11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-16 Thread Hohensee, Paul
Looks good! :)

-Original Message-
From: Daniel Jeliński 
Date: Tuesday, March 16, 2021 at 9:49 AM
To: "Hohensee, Paul" 
Cc: "jdk-updates-...@openjdk.java.net" , 
"security-dev@openjdk.java.net" 
Subject: RE: [11u] RFR 8259886: Improve SSL session cache performance and 
scalability

Thanks Paul for your review and for the hint.

Updated webrev: https://djelinski.github.io/8259886-11u/webrev2/index.html

compared to original, changes to make/test/BuildMicrobenchmark.gmk
were dropped because file does not exist in jdk11
compared to previous webrev, CacheBench was re-added.

Testing: Linux x86_64 tier1 and tier2.
Thanks,
Daniel

pon., 15 mar 2021 o 18:09 Hohensee, Paul  napisał(a):
>
> The changes to Cache.java look fine, but please include CacheBench.java. I'd 
> like to see 11u to stand on its own without reference to later releases, plus 
> I believe the 11u maintainers prefer to backport as much of a patch as 
> possible.
>
> Thanks,
> Paul
>
> -Original Message-
> From: security-dev  on behalf of Daniel 
> Jeliński 
> Date: Tuesday, March 9, 2021 at 3:37 PM
> To: "jdk-updates-...@openjdk.java.net" 
> Cc: "security-dev@openjdk.java.net" 
> Subject: [11u] RFR 8259886: Improve SSL session cache performance and 
> scalability
>
> Hi,
> Please review this 11u backport; this is the same patch as for head,
> except for microbenchmark makefile changes that did not apply because
> the file doesn't exist in 11u, and the actual microbenchmark, which
> would only add weight for no benefit.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
> webrev: https://djelinski.github.io/8259886-11u/webrev/index.html
>
> Testing: Linux x86_64 tier1 and tier2.
>
> Thanks,
> Daniel
>



Re: [11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-16 Thread Daniel Jeliński
Thanks Paul for your review and for the hint.

Updated webrev: https://djelinski.github.io/8259886-11u/webrev2/index.html

compared to original, changes to make/test/BuildMicrobenchmark.gmk
were dropped because file does not exist in jdk11
compared to previous webrev, CacheBench was re-added.

Testing: Linux x86_64 tier1 and tier2.
Thanks,
Daniel

pon., 15 mar 2021 o 18:09 Hohensee, Paul  napisał(a):
>
> The changes to Cache.java look fine, but please include CacheBench.java. I'd 
> like to see 11u to stand on its own without reference to later releases, plus 
> I believe the 11u maintainers prefer to backport as much of a patch as 
> possible.
>
> Thanks,
> Paul
>
> -Original Message-
> From: security-dev  on behalf of Daniel 
> Jeliński 
> Date: Tuesday, March 9, 2021 at 3:37 PM
> To: "jdk-updates-...@openjdk.java.net" 
> Cc: "security-dev@openjdk.java.net" 
> Subject: [11u] RFR 8259886: Improve SSL session cache performance and 
> scalability
>
> Hi,
> Please review this 11u backport; this is the same patch as for head,
> except for microbenchmark makefile changes that did not apply because
> the file doesn't exist in 11u, and the actual microbenchmark, which
> would only add weight for no benefit.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
> webrev: https://djelinski.github.io/8259886-11u/webrev/index.html
>
> Testing: Linux x86_64 tier1 and tier2.
>
> Thanks,
> Daniel
>


RE: [11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-15 Thread Hohensee, Paul
The changes to Cache.java look fine, but please include CacheBench.java. I'd 
like to see 11u to stand on its own without reference to later releases, plus I 
believe the 11u maintainers prefer to backport as much of a patch as possible.

Thanks,
Paul

-Original Message-
From: security-dev  on behalf of Daniel 
Jeliński 
Date: Tuesday, March 9, 2021 at 3:37 PM
To: "jdk-updates-...@openjdk.java.net" 
Cc: "security-dev@openjdk.java.net" 
Subject: [11u] RFR 8259886: Improve SSL session cache performance and 
scalability

Hi,
Please review this 11u backport; this is the same patch as for head,
except for microbenchmark makefile changes that did not apply because
the file doesn't exist in 11u, and the actual microbenchmark, which
would only add weight for no benefit.

Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
webrev: https://djelinski.github.io/8259886-11u/webrev/index.html

Testing: Linux x86_64 tier1 and tier2.

Thanks,
Daniel



[11u] RFR 8259886: Improve SSL session cache performance and scalability

2021-03-09 Thread Daniel Jeliński
Hi,
Please review this 11u backport; this is the same patch as for head,
except for microbenchmark makefile changes that did not apply because
the file doesn't exist in 11u, and the actual microbenchmark, which
would only add weight for no benefit.

Bug: https://bugs.openjdk.java.net/browse/JDK-8259886
webrev: https://djelinski.github.io/8259886-11u/webrev/index.html

Testing: Linux x86_64 tier1 and tier2.

Thanks,
Daniel


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v6]

2021-03-06 Thread djelinski
> Under certain load, MemoryCache operations take a substantial fraction of the 
> time needed to complete SSL handshakes. This series of patches improves 
> performance characteristics of MemoryCache, at the cost of a functional 
> change: expired entries are no longer guaranteed to be removed before live 
> ones. Unused entries are still removed before used ones, and cache 
> performance no longer depends on its capacity.
> 
> First patch in the series contains a benchmark that can be run with `make 
> test TEST="micro:CacheBench"`.
> Baseline results before any MemoryCache changes:
> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
> there's a nonlinear performance drop between 20480 and 204800 entries, 
> probably attributable to CPU cache thrashing. Beyond 204800 entries the cache 
> scales more linearly.
> 
> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
> only copy one:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
> The third patch improves worst-case times on a mostly idle cache by 
> scattering removal of expired entries over multiple `put` calls. It does not 
> affect performance of an overloaded cache.
> 
> The 4th patch removes all code that clears cached values before handing them 
> over to the GC. [This 
> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>  stated that clearing values was supposed to be a GC performance 
> optimization. It wasn't. Benchmark results after that commit:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
> I wasn't expecting that much of an improvement, and don't know how to explain 
> it.
> 
> The 40ns difference between cache with and without a timeout can be 
> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
> my VM.

djelinski has updated the pull request incrementally with one additional commit 
since the last revision:

  Update copyright year

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2255/files
  - new: https://git.openjdk.java.net/jdk/pull/2255/files/f9bc386a..d5c39a45

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=04-05

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2255.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2255/head:pull/2255

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v5]

2021-03-05 Thread Xue-Lei Andrew Fan
On Tue, 2 Mar 2021 21:00:04 GMT, djelinski 
 wrote:

>> Under certain load, MemoryCache operations take a substantial fraction of 
>> the time needed to complete SSL handshakes. This series of patches improves 
>> performance characteristics of MemoryCache, at the cost of a functional 
>> change: expired entries are no longer guaranteed to be removed before live 
>> ones. Unused entries are still removed before used ones, and cache 
>> performance no longer depends on its capacity.
>> 
>> First patch in the series contains a benchmark that can be run with `make 
>> test TEST="micro:CacheBench"`.
>> Baseline results before any MemoryCache changes:
>> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
>> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
>> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
>> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
>> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
>> there's a nonlinear performance drop between 20480 and 204800 entries, 
>> probably attributable to CPU cache thrashing. Beyond 204800 entries the 
>> cache scales more linearly.
>> 
>> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
>> only copy one:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
>> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
>> The third patch improves worst-case times on a mostly idle cache by 
>> scattering removal of expired entries over multiple `put` calls. It does not 
>> affect performance of an overloaded cache.
>> 
>> The 4th patch removes all code that clears cached values before handing them 
>> over to the GC. [This 
>> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>>  stated that clearing values was supposed to be a GC performance 
>> optimization. It wasn't. Benchmark results after that commit:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
>> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
>> I wasn't expecting that much of an improvement, and don't know how to 
>> explain it.
>> 
>> The 40ns difference between cache with and without a timeout can be 
>> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
>> my VM.
>
> djelinski has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Avoid unproductive cache scans
>  - Revert Cache changes

I would also update the copyright year to 2021.  Otherwise, looks good to me.  
Thank you!

-

Marked as reviewed by xuelei (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-03-05 Thread djelinski
On Mon, 22 Feb 2021 21:31:21 GMT, Xue-Lei Andrew Fan  wrote:

>> Actually there's a much easier solution to reduce the number of slow 
>> `put()`s without making any behavioral changes.
>> The cache object could store the earliest expire time, and then exit 
>> `expungeExpiredEntries()` early when current time is earlier than the 
>> earliest expire time - when it is, we know that there are no expired items 
>> in the queue and we can skip the scan entirely.
>> @XueleiFan do you think the above is worth exploring?
>
>> Actually there's a much easier solution to reduce the number of slow 
>> `put()`s without making any behavioral changes.
>> The cache object could store the earliest expire time, and then exit 
>> `expungeExpiredEntries()` early when current time is earlier than the 
>> earliest expire time - when it is, we know that there are no expired items 
>> in the queue and we can skip the scan entirely.
>> @XueleiFan do you think the above is worth exploring?
> 
> Definitely, I think it is a good improvement.  Actually, it is a surprise to 
> me that the current code is not working this way.
> 
> Sorry, I was/am on vacation, and the review could be delayed for a few days.

ping @XueleiFan, I'd appreciate another review.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-03-02 Thread djelinski
On Mon, 22 Feb 2021 21:31:21 GMT, Xue-Lei Andrew Fan  wrote:

>> Actually there's a much easier solution to reduce the number of slow 
>> `put()`s without making any behavioral changes.
>> The cache object could store the earliest expire time, and then exit 
>> `expungeExpiredEntries()` early when current time is earlier than the 
>> earliest expire time - when it is, we know that there are no expired items 
>> in the queue and we can skip the scan entirely.
>> @XueleiFan do you think the above is worth exploring?
>
>> Actually there's a much easier solution to reduce the number of slow 
>> `put()`s without making any behavioral changes.
>> The cache object could store the earliest expire time, and then exit 
>> `expungeExpiredEntries()` early when current time is earlier than the 
>> earliest expire time - when it is, we know that there are no expired items 
>> in the queue and we can skip the scan entirely.
>> @XueleiFan do you think the above is worth exploring?
> 
> Definitely, I think it is a good improvement.  Actually, it is a surprise to 
> me that the current code is not working this way.
> 
> Sorry, I was/am on vacation, and the review could be delayed for a few days.

I reverted all earlier Cache changes, and added a new commit that caches the 
earliest expire time of all cached items. The observable behavior of the new 
code is identical to original - items are removed from cache at exactly the 
same time as before; we only skip scanning the cache when we know that there 
are no expired items inside.

The performance is substantially improved. There can be at most `cache size` 
scans in every `cache lifetime` period, which is roughly one scan every 4 
seconds with the default SSL session cache settings. This is much better than 
possibly scanning on every `put()` that was possible before the changes.

My reduced set of benchmarks produced the following values:
Benchmark   (size)  (timeout)  Mode  CntScore   Error  Units
CacheBench.put   20480  86400  avgt   25  148.345 ? 1.970  ns/op
CacheBench.put   20480  0  avgt   25  108.598 ? 3.787  ns/op
CacheBench.put  204800  86400  avgt   25  151.318 ? 1.872  ns/op
CacheBench.put  204800  0  avgt   25  106.650 ? 1.080  ns/op 
which is comparable to what was observed with the previous commits.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v5]

2021-03-02 Thread djelinski
> Under certain load, MemoryCache operations take a substantial fraction of the 
> time needed to complete SSL handshakes. This series of patches improves 
> performance characteristics of MemoryCache, at the cost of a functional 
> change: expired entries are no longer guaranteed to be removed before live 
> ones. Unused entries are still removed before used ones, and cache 
> performance no longer depends on its capacity.
> 
> First patch in the series contains a benchmark that can be run with `make 
> test TEST="micro:CacheBench"`.
> Baseline results before any MemoryCache changes:
> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
> there's a nonlinear performance drop between 20480 and 204800 entries, 
> probably attributable to CPU cache thrashing. Beyond 204800 entries the cache 
> scales more linearly.
> 
> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
> only copy one:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
> The third patch improves worst-case times on a mostly idle cache by 
> scattering removal of expired entries over multiple `put` calls. It does not 
> affect performance of an overloaded cache.
> 
> The 4th patch removes all code that clears cached values before handing them 
> over to the GC. [This 
> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>  stated that clearing values was supposed to be a GC performance 
> optimization. It wasn't. Benchmark results after that commit:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
> I wasn't expecting that much of an improvement, and don't know how to explain 
> it.
> 
> The 40ns difference between cache with and without a timeout can be 
> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
> my VM.

djelinski has updated the pull request incrementally with two additional 
commits since the last revision:

 - Avoid unproductive cache scans
 - Revert Cache changes

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2255/files
  - new: https://git.openjdk.java.net/jdk/pull/2255/files/c7b064f0..f9bc386a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=03-04

  Stats: 96 lines in 1 file changed: 59 ins; 3 del; 34 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2255.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2255/head:pull/2255

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-22 Thread Xue-Lei Andrew Fan
On Mon, 22 Feb 2021 20:36:53 GMT, djelinski 
 wrote:

> Actually there's a much easier solution to reduce the number of slow `put()`s 
> without making any behavioral changes.
> The cache object could store the earliest expire time, and then exit 
> `expungeExpiredEntries()` early when current time is earlier than the 
> earliest expire time - when it is, we know that there are no expired items in 
> the queue and we can skip the scan entirely.
> @XueleiFan do you think the above is worth exploring?
Definitely, I think it is a good improvement.  Actually, it is a surprise to me 
that the current code is not working this way.

Sorry, I was/am on vacation, and the review could be delayed for a few days.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-22 Thread djelinski
On Tue, 16 Feb 2021 19:38:50 GMT, djelinski 
 wrote:

>>> I may think it differently. It may be hard to know the future frequency of 
>>> an cached item based on the past behaviors. For the above case, I'm not 
>>> sure that K=3 is used less frequently than K=1. Maybe, next few seconds, 
>>> K=1 could be more frequently.
>> 
>> I agree that such prediction might not be 100% accurate. But, quick google 
>> search reveals that there are 
>> [many](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf) 
>> [articles](https://link.springer.com/article/10.1007/PL9255) that claim 
>> that LRU caches offer better hit rates than FIFO, especially for in-memory 
>> caches.
>>> I would like a solution to following the timeout specification: keep the 
>>> newer items if possible.
>> 
>> That's a trivial change; all we need to do is change `true` to `false` 
>> [here](https://github.com/openjdk/jdk/blob/abe0e238bd25adb1ddd2b655613899bfa063cd85/src/java.base/share/classes/sun/security/util/Cache.java#L268).
>>  But, as stated above, LRU is better than FIFO, so I wouldn't want to do 
>> that.
>> 
>> I could keep LRU and add another linked list that would store items in the 
>> order of their expiration dates; then we could quickly scan that list for 
>> expired items. Note: the order of expiration dates is not necessarily the 
>> order of insertion, because 1) `System.currentTimeMillis()` is not monotonic 
>> - it can move back when something changes the system time, 2) the expiration 
>> date is calculated at insertion time, so if someone changes the timeout on a 
>> non-empty cache, new items may have shorter expiration time than old ones. 
>> So, I'd either need to address that first (change `currentTimeMillis` to 
>> `nanoTime` and store creation time instead of expiration time), or use 
>> insertion sort for adding items (which would get very slow if either of the 
>> above mentioned situations happened).
>> Let me know your thoughts.
>
> Well, if removing all expired items before evicting live ones is a 
> non-negotiable, implementing all operations in constant time is much easier 
> with FIFO, where we only need to keep one item order.
> The new commits contain the following changes:
> - use `nanoTime` instead of `currentTimeMillis` to make sure that time never 
> goes back
> - store insertion time instead of expiration time, so that older items always 
> expire before newer ones, even when timeout is changed
> - change internal hash map to store (and evict) items in insertion (FIFO) 
> order
> - always stop scanning entries after finding the first non-expired item, 
> because subsequent items are now guaranteed to have later expiration dates, 
> and collected soft references are handled by reference queue.
> 
> tier1 and jdk_security tests passed; benchmark results show only minimal 
> changes. I verified that none of the classes using `Cache` mentions LRU, 
> looks like this was an implementation detail.

Actually there's a much easier solution to reduce the number of slow `put()`s 
without making any behavioral changes.
The cache object could store the earliest expire time, and then exit 
`expungeExpiredEntries()` early when current time is earlier than the earliest 
expire time - when it is, we know that there are no expired items in the queue 
and we can skip the scan entirely.
@XueleiFan do you think the above is worth exploring?

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-16 Thread djelinski
On Wed, 10 Feb 2021 19:19:33 GMT, djelinski 
 wrote:

>>> Thanks for your review! Some comments below.
>>> 
>>> > A full handshake or OCSP status grabbing could counteract all the 
>>> > performance gain with the cache update.
>>> 
>>> Yes, but that's unlikely. Note that K=3 is before K=1 in the queue only 
>>> because 3 wasn't used since 1 was last used. This means that either K=3 is 
>>> used less frequently than K=1, or that all cached items are in active use. 
>>> In the former case we don't lose much by dropping K=3 (granted, there's 
>>> nothing to offset that). In the latter we are dealing with full cache at 
>>> all times, which means that most `put()`s would scan the queue, and we will 
>>> gain a lot by finishing faster.
>> 
>> I may think it differently.  It may be hard to know the future frequency of 
>> an cached item based on the past behaviors.  For the above case, I'm not 
>> sure that K=3 is used less frequently than K=1.  Maybe, next few seconds, 
>> K=1 could be more frequently.
>> 
>> I would like a solution to following the timeout specification: keep the 
>> newer items if possible.
>> 
>>> 
>>> > get() [..] without [..] change the order of the queue
>>> 
>>> If we do that, frequently used entries will be evicted at the same age as 
>>> never used ones. This means we will have to recompute (full handshake/fresh 
>>> OCSP) both the frequently used and the infrequently used entries. It's 
>>> better to recompute only the infrequently used ones, and reuse the 
>>> frequently used as long as possible - we will do less work that way.
>>> That's probably the reason why a `LinkedHashMap` with `accessOrder=true` 
>>> was chosen as the backing store implementation originally.
>>> 
>> 
>> See above.  It may be true for some case to determine the frequency, but 
>> Cache is a general class and we may want to be more careful about if we are 
>> really be able to determine the frequency within the Cache implementation.
>> 
>>> > get() performance is more important [..] so we should try to keep the 
>>> > cache small if possible
>>> 
>>> I don't see the link; could you explain?
>>> 
>> 
>> link? Did you mean the link to get() method?  It is a method in the Cache 
>> class.
>> 
>>> > In the update, the SoftReference.clear() get removed. I'm not sure of the 
>>> > impact of the enqueued objects any longer. In theory, it could improve 
>>> > the memory use, which could counteract the performance gain in some 
>>> > situation.
>>> 
>>> That's the best part: no objects ever get enqueued! We only called 
>>> `clear()` right before losing the last reference to `SoftCacheEntry` (which 
>>> is the `SoftReference`). When GC collects the `SoftReference`, it does not 
>>> enqueue anything. GC only enqueues the `SoftReference` when it collects the 
>>> referenced object (session / OCSP response) without collecting the 
>>> `SoftReference` (cache entry) itself.
>>> This is [documented 
>>> behavior](https://docs.oracle.com/javase/7/docs/api/java/lang/ref/package-summary.html):
>>>  _If a registered reference becomes unreachable itself, then it will never 
>>> be enqueued._
>>> 
>> 
>> I need more time for this section.
>> 
>>> > Could you edit the bug
>>> 
>>> I'd need an account on the bug tracker first.
>> 
>> Okay.  No worries, I will help you if we could get an agreement about the 
>> update.
>
>> I may think it differently. It may be hard to know the future frequency of 
>> an cached item based on the past behaviors. For the above case, I'm not sure 
>> that K=3 is used less frequently than K=1. Maybe, next few seconds, K=1 
>> could be more frequently.
> 
> I agree that such prediction might not be 100% accurate. But, quick google 
> search reveals that there are 
> [many](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf) 
> [articles](https://link.springer.com/article/10.1007/PL9255) that claim 
> that LRU caches offer better hit rates than FIFO, especially for in-memory 
> caches.
>> I would like a solution to following the timeout specification: keep the 
>> newer items if possible.
> 
> That's a trivial change; all we need to do is change `true` to `false` 
> [here](https://github.com/openjdk/jdk/blob/abe0e238bd25adb1ddd2b655613899bfa063cd85/src/java.base/share/classes/sun/security/util/Cache.java#L268).
>  But, as stated above, LRU is better than FIFO, so I wouldn't want to do that.
> 
> I could keep LRU and add another linked list that would store items in the 
> order of their expiration dates; then we could quickly scan that list for 
> expired items. Note: the order of expiration dates is not necessarily the 
> order of insertion, because 1) `System.currentTimeMillis()` is not monotonic 
> - it can move back when something changes the system time, 2) the expiration 
> date is calculated at insertion time, so if someone changes the timeout on a 
> non-empty cache, new items may have shorter expiration time than old ones. 
> So, I'd either need to address that first (change `curr

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v4]

2021-02-16 Thread djelinski
> Under certain load, MemoryCache operations take a substantial fraction of the 
> time needed to complete SSL handshakes. This series of patches improves 
> performance characteristics of MemoryCache, at the cost of a functional 
> change: expired entries are no longer guaranteed to be removed before live 
> ones. Unused entries are still removed before used ones, and cache 
> performance no longer depends on its capacity.
> 
> First patch in the series contains a benchmark that can be run with `make 
> test TEST="micro:CacheBench"`.
> Baseline results before any MemoryCache changes:
> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
> there's a nonlinear performance drop between 20480 and 204800 entries, 
> probably attributable to CPU cache thrashing. Beyond 204800 entries the cache 
> scales more linearly.
> 
> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
> only copy one:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
> The third patch improves worst-case times on a mostly idle cache by 
> scattering removal of expired entries over multiple `put` calls. It does not 
> affect performance of an overloaded cache.
> 
> The 4th patch removes all code that clears cached values before handing them 
> over to the GC. [This 
> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>  stated that clearing values was supposed to be a GC performance 
> optimization. It wasn't. Benchmark results after that commit:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
> I wasn't expecting that much of an improvement, and don't know how to explain 
> it.
> 
> The 40ns difference between cache with and without a timeout can be 
> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
> my VM.

djelinski has updated the pull request incrementally with two additional 
commits since the last revision:

 - Switch cache to FIFO order
 - Order of expiration should match insertion

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2255/files
  - new: https://git.openjdk.java.net/jdk/pull/2255/files/abe0e238..c7b064f0

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=02-03

  Stats: 33 lines in 1 file changed: 1 ins; 3 del; 29 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2255.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2255/head:pull/2255

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-10 Thread djelinski
On Wed, 10 Feb 2021 00:44:57 GMT, Xue-Lei Andrew Fan  wrote:

> I may think it differently. It may be hard to know the future frequency of an 
> cached item based on the past behaviors. For the above case, I'm not sure 
> that K=3 is used less frequently than K=1. Maybe, next few seconds, K=1 could 
> be more frequently.

I agree that such prediction might not be 100% accurate. But, quick google 
search reveals that there are 
[many](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf) 
[articles](https://link.springer.com/article/10.1007/PL9255) that claim 
that LRU caches offer better hit rates than FIFO, especially for in-memory 
caches.
> I would like a solution to following the timeout specification: keep the 
> newer items if possible.

That's a trivial change; all we need to do is change `true` to `false` 
[here](https://github.com/openjdk/jdk/blob/abe0e238bd25adb1ddd2b655613899bfa063cd85/src/java.base/share/classes/sun/security/util/Cache.java#L268).
 But, as stated above, LRU is better than FIFO, so I wouldn't want to do that.

I could keep LRU and add another linked list that would store items in the 
order of their expiration dates; then we could quickly scan that list for 
expired items. Note: the order of expiration dates is not necessarily the order 
of insertion, because 1) `System.currentTimeMillis()` is not monotonic - it can 
move back when something changes the system time, 2) the expiration date is 
calculated at insertion time, so if someone changes the timeout on a non-empty 
cache, new items may have shorter expiration time than old ones. So, I'd either 
need to address that first (change `currentTimeMillis` to `nanoTime` and store 
creation time instead of expiration time), or use insertion sort for adding 
items (which would get very slow if either of the above mentioned situations 
happened).
Let me know your thoughts.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-09 Thread Xue-Lei Andrew Fan
On Tue, 9 Feb 2021 08:44:28 GMT, djelinski 
 wrote:

> So, how do we want to proceed here? Is the proposed solution acceptable? If 
> not, what needs to change? if yes, what do I need to do next?

For me, it is a pretty good solution, but I have some concerns.  I appreciate 
if you would like to read my comment and see if we could have an agreement.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-09 Thread Xue-Lei Andrew Fan
On Thu, 4 Feb 2021 20:45:55 GMT, djelinski 
 wrote:

> Thanks for your review! Some comments below.
> 
> > A full handshake or OCSP status grabbing could counteract all the 
> > performance gain with the cache update.
> 
> Yes, but that's unlikely. Note that K=3 is before K=1 in the queue only 
> because 3 wasn't used since 1 was last used. This means that either K=3 is 
> used less frequently than K=1, or that all cached items are in active use. In 
> the former case we don't lose much by dropping K=3 (granted, there's nothing 
> to offset that). In the latter we are dealing with full cache at all times, 
> which means that most `put()`s would scan the queue, and we will gain a lot 
> by finishing faster.

I may think it differently.  It may be hard to know the future frequency of an 
cached item based on the past behaviors.  For the above case, I'm not sure that 
K=3 is used less frequently than K=1.  Maybe, next few seconds, K=1 could be 
more frequently.

I would like a solution to following the timeout specification: keep the newer 
items if possible.

> 
> > get() [..] without [..] change the order of the queue
> 
> If we do that, frequently used entries will be evicted at the same age as 
> never used ones. This means we will have to recompute (full handshake/fresh 
> OCSP) both the frequently used and the infrequently used entries. It's better 
> to recompute only the infrequently used ones, and reuse the frequently used 
> as long as possible - we will do less work that way.
> That's probably the reason why a `LinkedHashMap` with `accessOrder=true` was 
> chosen as the backing store implementation originally.
> 

See above.  It may be true for some case to determine the frequency, but Cache 
is a general class and we may want to be more careful about if we are really be 
able to determine the frequency within the Cache implementation.

> > get() performance is more important [..] so we should try to keep the cache 
> > small if possible
> 
> I don't see the link; could you explain?
> 

link? Did you mean the link to get() method?  It is a method in the Cache class.

> > In the update, the SoftReference.clear() get removed. I'm not sure of the 
> > impact of the enqueued objects any longer. In theory, it could improve the 
> > memory use, which could counteract the performance gain in some situation.
> 
> That's the best part: no objects ever get enqueued! We only called `clear()` 
> right before losing the last reference to `SoftCacheEntry` (which is the 
> `SoftReference`). When GC collects the `SoftReference`, it does not enqueue 
> anything. GC only enqueues the `SoftReference` when it collects the 
> referenced object (session / OCSP response) without collecting the 
> `SoftReference` (cache entry) itself.
> This is [documented 
> behavior](https://docs.oracle.com/javase/7/docs/api/java/lang/ref/package-summary.html):
>  _If a registered reference becomes unreachable itself, then it will never be 
> enqueued._
> 

I need more time for this section.

> > Could you edit the bug
> 
> I'd need an account on the bug tracker first.

Okay.  No worries, I will help you if we could get an agreement about the 
update.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-09 Thread djelinski
On Thu, 4 Feb 2021 20:45:55 GMT, djelinski 
 wrote:

>> Thank you for the comment. The big picture is more clear to me now.
>> 
>>> Example 2:
>>> Old implementation will get:
>>> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16|
>>>
>>> New implementation will get:
>>> |K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16|
>> 
>> K=3 is not expired yet, but get removed, while K=1 is kept.  This behavior 
>> change may cause more overall performance hurt than improving the cache 
>> put/get performance.  For example, it need to grab the value remotely.   A 
>> full handshake or OCSP status grabbing could counteract all the performance 
>> gain with the cache update.
>> 
>>> All calls to put() remove expired items from the front of the queue, and 
>>> never perform a full scan. get() calls shuffle the queue, moving the 
>>> accessed item to the back. Compare this to original code where put() only 
>>> removed expired items when the cache overflowed, and scanned the entire 
>>> cache.
>> 
>> I think the idea that put() remove expired items from the front of the queue 
>> is good.  I was wondering if it is an option to have the get() method that 
>> removed expired items until the 1st un-expired item, without scan the full 
>> queue and change the order of the queue.  But there is still an issue that 
>> the SoftReference may have clear an item, which may be still valid.
>> 
>> In general, I think the get() performance is more important than put() 
>> method, as get() is called more frequently.  So we should try to keep the 
>> cache small if possible.
>> 
 increase the size to some big scales, like 2M and 20M
>>>
>>> Can do. Do you think it makes sense to also benchmark the scenario where GC 
>>> kicks in and collects soft references?
>> 
>> In the update, the SoftReference.clear() get removed.  I'm not sure of the 
>> impact of the enqueued objects any longer.  In theory, it could improve the 
>> memory use, which could counteract the performance gain in some situation.
>> 
>>> Also, what do you think about the changes done in Do not invalidate objects 
>>> before GC 5859a03 commit? 
>> 
>> See above, it is a concern to me that the soft reference cannot be cleared 
>> with this update.
>> 
>>> How do I file a CSR?
>> 
>> Could you edit the bug: https://bugs.openjdk.java.net/browse/JDK-8259886?  
>> In the more drop down menu, there is a "Create CSR" option.  You can do it 
>> if we have an agreement about the solution and impact.
>
> Thanks for your review! Some comments below.
>> A full handshake or OCSP status grabbing could counteract all the 
>> performance gain with the cache update.
> 
> Yes, but that's unlikely. Note that K=3 is before K=1 in the queue only 
> because 3 wasn't used since 1 was last used. This means that either K=3 is 
> used less frequently than K=1, or that all cached items are in active use. In 
> the former case we don't lose much by dropping K=3 (granted, there's nothing 
> to offset that). In the latter we are dealing with full cache at all times, 
> which means that most `put()`s would scan the queue, and we will gain a lot 
> by finishing faster.
>> get() [..] without [..] change the order of the queue
> 
> If we do that, frequently used entries will be evicted at the same age as 
> never used ones. This means we will have to recompute (full handshake/fresh 
> OCSP) both the frequently used and the infrequently used entries. It's better 
> to recompute only the infrequently used ones, and reuse the frequently used 
> as long as possible - we will do less work that way.
> That's probably the reason why a `LinkedHashMap` with `accessOrder=true` was 
> chosen as the backing store implementation originally.
>> get() performance is more important [..] so we should try to keep the cache 
>> small if possible
> 
> I don't see the link; could you explain?
>> In the update, the SoftReference.clear() get removed. I'm not sure of the 
>> impact of the enqueued objects any longer. In theory, it could improve the 
>> memory use, which could counteract the performance gain in some situation.
> 
> That's the best part: no objects ever get enqueued! We only called `clear()` 
> right before losing the last reference to `SoftCacheEntry` (which is the 
> `SoftReference`). When GC collects the `SoftReference`, it does not enqueue 
> anything. GC only enqueues the `SoftReference` when it collects the 
> referenced object (session / OCSP response) without collecting the 
> `SoftReference` (cache entry) itself.
> This is [documented 
> behavior](https://docs.oracle.com/javase/7/docs/api/java/lang/ref/package-summary.html):
>  _If a registered reference becomes unreachable itself, then it will never be 
> enqueued._
>> Could you edit the bug
> 
> I'd need an account on the bug tracker first.

So, how do we want to proceed here? Is the proposed solution acceptable? If 
not, what needs to change? if yes, what do I need to do next?

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-04 Thread djelinski
On Thu, 4 Feb 2021 19:36:24 GMT, Xue-Lei Andrew Fan  wrote:

>> Added benchmarks for get & remove. Added tests for 5M cache size. Switched 
>> time units to nanoseconds. Results:
>> Benchmark   (size)  (timeout)  Mode  CntScore   Error  Units
>> CacheBench.get   20480  86400  avgt   25   62.999 ? 2.017  ns/op
>> CacheBench.get   20480  0  avgt   25   41.519 ? 1.113  ns/op
>> CacheBench.get  204800  86400  avgt   25   67.995 ? 4.530  ns/op
>> CacheBench.get  204800  0  avgt   25   46.439 ? 2.222  ns/op
>> CacheBench.get 512  86400  avgt   25   72.516 ? 0.759  ns/op
>> CacheBench.get 512  0  avgt   25   53.471 ? 0.491  ns/op
>> CacheBench.put   20480  86400  avgt   25  117.117 ? 3.424  ns/op
>> CacheBench.put   20480  0  avgt   25   73.582 ? 1.484  ns/op
>> CacheBench.put  204800  86400  avgt   25  116.983 ? 0.743  ns/op
>> CacheBench.put  204800  0  avgt   25   73.945 ? 0.515  ns/op
>> CacheBench.put 512  86400  avgt   25  230.878 ? 7.582  ns/op
>> CacheBench.put 512  0  avgt   25  192.526 ? 7.048  ns/op
>> CacheBench.remove20480  86400  avgt   25   39.048 ? 2.036  ns/op
>> CacheBench.remove20480  0  avgt   25   36.293 ? 0.281  ns/op
>> CacheBench.remove   204800  86400  avgt   25   43.899 ? 0.895  ns/op
>> CacheBench.remove   204800  0  avgt   25   43.046 ? 0.759  ns/op
>> CacheBench.remove  512  86400  avgt   25   51.896 ? 0.640  ns/op
>> CacheBench.remove  512  0  avgt   25   51.537 ? 0.536  ns/op
>
> Thank you for the comment. The big picture is more clear to me now.
> 
>> Example 2:
>> Old implementation will get:
>> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16|
>>
>> New implementation will get:
>> |K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16|
> 
> K=3 is not expired yet, but get removed, while K=1 is kept.  This behavior 
> change may cause more overall performance hurt than improving the cache 
> put/get performance.  For example, it need to grab the value remotely.   A 
> full handshake or OCSP status grabbing could counteract all the performance 
> gain with the cache update.
> 
>> All calls to put() remove expired items from the front of the queue, and 
>> never perform a full scan. get() calls shuffle the queue, moving the 
>> accessed item to the back. Compare this to original code where put() only 
>> removed expired items when the cache overflowed, and scanned the entire 
>> cache.
> 
> I think the idea that put() remove expired items from the front of the queue 
> is good.  I was wondering if it is an option to have the get() method that 
> removed expired items until the 1st un-expired item, without scan the full 
> queue and change the order of the queue.  But there is still an issue that 
> the SoftReference may have clear an item, which may be still valid.
> 
> In general, I think the get() performance is more important than put() 
> method, as get() is called more frequently.  So we should try to keep the 
> cache small if possible.
> 
>>> increase the size to some big scales, like 2M and 20M
>>
>> Can do. Do you think it makes sense to also benchmark the scenario where GC 
>> kicks in and collects soft references?
> 
> In the update, the SoftReference.clear() get removed.  I'm not sure of the 
> impact of the enqueued objects any longer.  In theory, it could improve the 
> memory use, which could counteract the performance gain in some situation.
> 
>> Also, what do you think about the changes done in Do not invalidate objects 
>> before GC 5859a03 commit? 
> See above, it is a concern to me that the soft reference cannot be cleared 
> with this update.
> 
>> How do I file a CSR?
> Could you edit the bug: https://bugs.openjdk.java.net/browse/JDK-8259886?  In 
> the more drop down menu, there is a "Create CSR" option.  You can do it if we 
> have an agreement about the solution and impact.

Thanks for your review! Some comments below.
> A full handshake or OCSP status grabbing could counteract all the performance 
> gain with the cache update.

Yes, but that's unlikely. Note that K=3 is before K=1 in the queue only because 
3 wasn't used since 1 was last used. This means that either K=3 is used less 
frequently than K=1, or that all cached items are in active use. In the former 
case we don't lose much by dropping K=3, and in the latter we are dealing with 
full cache at all times, which means that most `put()`s remove un-expired items 
anyway.
> get() [..] without [..] change the order of the queue

If we do that, frequently used entries will be evicted at the same age as never 
used ones. This means we will have to recompute (full handshake/fresh OCSP) 
both the frequently used and the infrequently used entries. It's better to 
recompute only the infrequently used ones, and reuse the frequently used as 
long as possible - we will do less work that way.
> get() performance 

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-04 Thread Xue-Lei Andrew Fan
On Thu, 4 Feb 2021 19:17:21 GMT, djelinski 
 wrote:

>>> the benchmark performance improvement is a trade-off between CPU and 
>>> memory, by keeping expired entries while putting a new entry in the cache
>> 
>> Not exactly. The memory use is capped by cache size. The patch is a trade 
>> off between the cache's hit/miss ratio and CPU; we will get faster cache 
>> access at the cost of more frequent cache misses.
>> 
>> All calls to `put()` remove expired items from the front of the queue, and 
>> never perform a full scan. `get()` calls shuffle the queue, moving the 
>> accessed item to the back. Compare this to original code where `put()` only 
>> removed expired items when the cache overflowed, and scanned the entire 
>> cache.
>> Let me give some examples. 
>> **Example 1**: insertions at a fast pace leading to cache overflows and no 
>> expirations. Here the new implementation improves performance. Consider a 
>> cache with size=4, timeout=10, and the following sequence of events:
>> T=1, put(1);
>> T=2, put(2);
>> T=3, put(3);
>> T=4, put(4);
>> Cache contents after these calls (same in old and new scenario). Queue 
>> order: least recently accessed items on the left, most recently accessed on 
>> the right. _K_ denotes cache key, _exp_ denotes entry expiration time and is 
>> equal to insertion time _T_ plus timeout:
>> 
>> |K=1, exp=11|K=2, exp=12|K=3, exp=13|K=4, exp=14|
>> 
>> If we now add another item to the queue, it will overflow. Here's where the 
>> implementations behave differently, but the outcome is identical: old one 
>> scans the entire list for expired entries; new one improves performance by 
>> ending the search for expired entries after encountering the first 
>> non-expired entry (which is the first entry in the above example). The end 
>> result is the same in both cases - oldest (least recently accessed) item is 
>> dropped:
>> T=5, put(5)
>> 
>> |K=2, exp=12|K=3, exp=13|K=4, exp=14|K=5, exp=15|
>> 
>> **Example 2**: insertions at a moderate pace, with interleaved reads. Here 
>> the new implementation improves performance, but at a possible cost of 
>> wasting cache capacity on expired entries. Consider a cache with size=4, 
>> timeout=7, and the following sequence of events:
>> T=1, put(1);
>> T=3, put(3);
>> T=5, put(5);
>> T=7, put(7);
>> T=7, get(1);
>> Cache contents after these calls:
>> 
>> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=1, exp=8|
>> 
>> `get(1)` operation moved item with K=1 to the back of the queue.
>> 
>> If we wait for item with K=1 to expire and then add another item to the 
>> queue, it will overflow. Here's where the implementations behave 
>> differently, and the outcome is different: old one scans the entire list for 
>> expired entries, finds entry with K=1 and drops it; new one gives up after 
>> first non-expired entry (which is the first entry), and drops the first 
>> entry.
>> 
>> So, when we perform:
>> T=9, put(9);
>> 
>> Old implementation will get:
>> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16|
>> 
>> New implementation will get:
>> |K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16|
>> 
>> Note that:
>> - an attempt to retrieve expired item (i.e. `get(1)`) will immediately 
>> remove that item from cache, making room for other items
>> - retrieving a non-expired item will move it to the back of the queue, 
>> behind all expired items
>> 
>> **Example 3**: insertions at a slow pace, where most items expire before 
>> queue overflows. Here the new implementation improves memory consumption. 
>> Consider a cache with size=4, timeout=1, and the following sequence of 
>> events:
>> T=1, put(1);
>> T=3, put(3);
>> T=5, put(5);
>> T=7, put(7);
>> Every cache item is expired at then point when a new one is added. Old 
>> implementation only removes expired entries when cache overflows, so all 
>> entries will still be there:
>> 
>> |K=1, exp=2(expired)|K=3, exp=4(expired)|K=5, exp=6(expired)|K=7, exp=8|
>> 
>> New implementation removes expired entries on every put, so after the last 
>> put only one entry is in the cache:
>> 
>> |K=7, exp=8|
>> 
>> After another put the old implementation will encounter a cache overflow and 
>> remove all expired items.
>> 
>> Let me know if that helped.
>> 
>>> add two more types of benchmark: get the entries and remove the entries
>> 
>> Both these operations are constant-time, both before and after my changes. 
>> Do you expect to see some oddities here, or do we just want a benchmark that 
>> could be used to compare other implementations?
>> 
>>> increase the size to some big scales, like 2M and 20M
>> 
>> Can do. Do you think it makes sense to also benchmark the scenario where GC 
>> kicks in and collects soft references?
>> 
>>> it may change the behavior of a few JDK components
>> 
>> Of all uses of Cache, only `SSLSessionContextImpl` (TLS session cache), 
>> `StatusResponseManager` (OCSP stapling) and `LDAPCertStoreImpl` (I'm not 
>> familiar with that one) set expiration timeout; when the timeo

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-04 Thread djelinski
On Tue, 2 Feb 2021 12:19:22 GMT, djelinski 
 wrote:

>> If I get the patch right, the benchmark performance improvement is a 
>> trade-off between CPU and memory, by keeping expired entries while putting a 
>> new entry in the cache.  I'm not very sure of the performance impact on 
>> memory and GC collections.  Would you mind add two more types of benchmark: 
>> get the entries and remove the entries, for cases that there are 1/10, 1/4, 
>> 1/3 and 1/2 expired entries in the cache?  And increase the size to some big 
>> scales, like 2M and 20M.
>> 
>> It looks like a spec update as it may change the behavior of a few JDK 
>> components (TLS session cache, OCSP stapling response cache, cert store 
>> cache, certificate factory, etc), because of "expired entries are no longer 
>> guaranteed to be removed before live ones".  I'm not very sure of the 
>> impact. I may suggest to file a CSR and have more eyes to check the 
>> compatibility impact before moving forward.
>
>> the benchmark performance improvement is a trade-off between CPU and memory, 
>> by keeping expired entries while putting a new entry in the cache
> 
> Not exactly. The memory use is capped by cache size. The patch is a trade off 
> between the cache's hit/miss ratio and CPU; we will get faster cache access 
> at the cost of more frequent cache misses.
> 
> All calls to `put()` remove expired items from the front of the queue, and 
> never perform a full scan. `get()` calls shuffle the queue, moving the 
> accessed item to the back. Compare this to original code where `put()` only 
> removed expired items when the cache overflowed, and scanned the entire cache.
> Let me give some examples. 
> **Example 1**: insertions at a fast pace leading to cache overflows and no 
> expirations. Here the new implementation improves performance. Consider a 
> cache with size=4, timeout=10, and the following sequence of events:
> T=1, put(1);
> T=2, put(2);
> T=3, put(3);
> T=4, put(4);
> Cache contents after these calls (same in old and new scenario). Queue order: 
> least recently accessed items on the left, most recently accessed on the 
> right. _K_ denotes cache key, _exp_ denotes entry expiration time and is 
> equal to insertion time _T_ plus timeout:
> 
> |K=1, exp=11|K=2, exp=12|K=3, exp=13|K=4, exp=14|
> 
> If we now add another item to the queue, it will overflow. Here's where the 
> implementations behave differently, but the outcome is identical: old one 
> scans the entire list for expired entries; new one improves performance by 
> ending the search for expired entries after encountering the first 
> non-expired entry (which is the first entry in the above example). The end 
> result is the same in both cases - oldest (least recently accessed) item is 
> dropped:
> T=5, put(5)
> 
> |K=2, exp=12|K=3, exp=13|K=4, exp=14|K=5, exp=15|
> 
> **Example 2**: insertions at a moderate pace, with interleaved reads. Here 
> the new implementation improves performance, but at a possible cost of 
> wasting cache capacity on expired entries. Consider a cache with size=4, 
> timeout=7, and the following sequence of events:
> T=1, put(1);
> T=3, put(3);
> T=5, put(5);
> T=7, put(7);
> T=7, get(1);
> Cache contents after these calls:
> 
> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=1, exp=8|
> 
> `get(1)` operation moved item with K=1 to the back of the queue.
> 
> If we wait for item with K=1 to expire and then add another item to the 
> queue, it will overflow. Here's where the implementations behave differently, 
> and the outcome is different: old one scans the entire list for expired 
> entries, finds entry with K=1 and drops it; new one gives up after first 
> non-expired entry (which is the first entry), and drops the first entry.
> 
> So, when we perform:
> T=9, put(9);
> 
> Old implementation will get:
> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16|
> 
> New implementation will get:
> |K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16|
> 
> Note that:
> - an attempt to retrieve expired item (i.e. `get(1)`) will immediately remove 
> that item from cache, making room for other items
> - retrieving a non-expired item will move it to the back of the queue, behind 
> all expired items
> 
> **Example 3**: insertions at a slow pace, where most items expire before 
> queue overflows. Here the new implementation improves memory consumption. 
> Consider a cache with size=4, timeout=1, and the following sequence of events:
> T=1, put(1);
> T=3, put(3);
> T=5, put(5);
> T=7, put(7);
> Every cache item is expired at then point when a new one is added. Old 
> implementation only removes expired entries when cache overflows, so all 
> entries will still be there:
> 
> |K=1, exp=2(expired)|K=3, exp=4(expired)|K=5, exp=6(expired)|K=7, exp=8|
> 
> New implementation removes expired entries on every put, so after the last 
> put only one entry is in the cache:
> 
> |K=7, exp=8|
> 
> After another put the old implementation will encounter a cache overflow and 
> r

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v3]

2021-02-04 Thread djelinski
> Under certain load, MemoryCache operations take a substantial fraction of the 
> time needed to complete SSL handshakes. This series of patches improves 
> performance characteristics of MemoryCache, at the cost of a functional 
> change: expired entries are no longer guaranteed to be removed before live 
> ones. Unused entries are still removed before used ones, and cache 
> performance no longer depends on its capacity.
> 
> First patch in the series contains a benchmark that can be run with `make 
> test TEST="micro:CacheBench"`.
> Baseline results before any MemoryCache changes:
> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
> there's a nonlinear performance drop between 20480 and 204800 entries, 
> probably attributable to CPU cache thrashing. Beyond 204800 entries the cache 
> scales more linearly.
> 
> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
> only copy one:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
> The third patch improves worst-case times on a mostly idle cache by 
> scattering removal of expired entries over multiple `put` calls. It does not 
> affect performance of an overloaded cache.
> 
> The 4th patch removes all code that clears cached values before handing them 
> over to the GC. [This 
> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>  stated that clearing values was supposed to be a GC performance 
> optimization. It wasn't. Benchmark results after that commit:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
> I wasn't expecting that much of an improvement, and don't know how to explain 
> it.
> 
> The 40ns difference between cache with and without a timeout can be 
> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
> my VM.

djelinski has updated the pull request incrementally with one additional commit 
since the last revision:

  Add benchmarks for get and remove

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2255/files
  - new: https://git.openjdk.java.net/jdk/pull/2255/files/34949970..abe0e238

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=01-02

  Stats: 76 lines in 1 file changed: 54 ins; 4 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2255.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2255/head:pull/2255

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-02 Thread djelinski
On Tue, 2 Feb 2021 02:37:39 GMT, Xue-Lei Andrew Fan  wrote:

> the benchmark performance improvement is a trade-off between CPU and memory, 
> by keeping expired entries while putting a new entry in the cache

Not exactly. The memory use is capped by cache size. The patch is a trade off 
between the cache's hit/miss ratio and CPU; we will get faster cache access at 
the cost of more frequent cache misses.

All calls to `put()` remove expired items from the front of the queue, and 
never perform a full scan. `get()` calls shuffle the queue, moving the accessed 
item to the back. Compare this to original code where `put()` only removed 
expired items when the cache overflowed, and scanned the entire cache.
Let me give some examples. 
**Example 1**: insertions at a fast pace leading to cache overflows and no 
expirations. Here the new implementation improves performance. Consider a cache 
with size=4, timeout=10, and the following sequence of events:
T=1, put(1);
T=2, put(2);
T=3, put(3);
T=4, put(4);
Cache contents after these calls (same in old and new scenario). Queue order: 
least recently accessed items on the left, most recently accessed on the right. 
_K_ denotes cache key, _exp_ denotes entry expiration time and is equal to 
insertion time _T_ plus timeout:

|K=1, exp=11|K=2, exp=12|K=3, exp=13|K=4, exp=14|

If we now add another item to the queue, it will overflow. Here's where the 
implementations behave differently, but the outcome is identical: old one scans 
the entire list for expired entries; new one improves performance by ending the 
search for expired entries after encountering the first non-expired entry 
(which is the first entry in the above example). The end result is the same in 
both cases - oldest (least recently accessed) item is dropped:
T=5, put(5)

|K=2, exp=12|K=3, exp=13|K=4, exp=14|K=5, exp=15|

**Example 2**: insertions at a moderate pace, with interleaved reads. Here the 
new implementation improves performance, but at a possible cost of wasting 
cache capacity on expired entries. Consider a cache with size=4, timeout=7, and 
the following sequence of events:
T=1, put(1);
T=3, put(3);
T=5, put(5);
T=7, put(7);
T=7, get(1);
Cache contents after these calls:

|K=3, exp=10|K=5, exp=12|K=7, exp=14|K=1, exp=8|

`get(1)` operation moved item with K=1 to the back of the queue.

If we wait for item with K=1 to expire and then add another item to the queue, 
it will overflow. Here's where the implementations behave differently, and the 
outcome is different: old one scans the entire list for expired entries, finds 
entry with K=1 and drops it; new one gives up after first non-expired entry 
(which is the first entry), and drops the first entry.

So, when we perform:
T=9, put(9);

Old implementation will get:
|K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16|

New implementation will get:
|K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16|

Note that:
- an attempt to retrieve expired item (i.e. `get(1)`) will immediately remove 
that item from cache, making room for other items
- retrieving a non-expired item will move it to the back of the queue, behind 
all expired items

**Example 3**: insertions at a slow pace, where most items expire before queue 
overflows. Here the new implementation improves memory consumption. Consider a 
cache with size=4, timeout=1, and the following sequence of events:
T=1, put(1);
T=3, put(3);
T=5, put(5);
T=7, put(7);
Every cache item is expired at then point when a new one is added. Old 
implementation only removes expired entries when cache overflows, so all 
entries will still be there:

|K=1, exp=2(expired)|K=3, exp=4(expired)|K=5, exp=6(expired)|K=7, exp=8|

New implementation removes expired entries on every put, so after the last put 
only one entry is in the cache:

|K=7, exp=8|

After another put the old implementation will encounter a cache overflow and 
remove all expired items.

Let me know if that helped.

> add two more types of benchmark: get the entries and remove the entries

Both these operations are constant-time, both before and after my changes. Do 
you expect to see some oddities here, or do we just want a benchmark that could 
be used to compare other implementations?

> increase the size to some big scales, like 2M and 20M

Can do. Do you think it makes sense to also benchmark the scenario where GC 
kicks in and collects soft references?

> it may change the behavior of a few JDK components

Of all uses of Cache, only `SSLSessionContextImpl` (TLS session cache), 
`StatusResponseManager` (OCSP stapling) and `LDAPCertStoreImpl` (I'm not 
familiar with that one) set expiration timeout; when the timeout is not set, 
the behavior is exactly the same as before.
`StatusResponseManager` is constantly querying the same keys, and is liberally 
sized, so I don't expect much of an impact.
TLS session cache changes may result in fewer session resumptions and more full 
handshakes; I expect the cache performance improvement to more than off

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-01 Thread Xue-Lei Andrew Fan
On Mon, 1 Feb 2021 18:49:04 GMT, djelinski 
 wrote:

>> Under certain load, MemoryCache operations take a substantial fraction of 
>> the time needed to complete SSL handshakes. This series of patches improves 
>> performance characteristics of MemoryCache, at the cost of a functional 
>> change: expired entries are no longer guaranteed to be removed before live 
>> ones. Unused entries are still removed before used ones, and cache 
>> performance no longer depends on its capacity.
>> 
>> First patch in the series contains a benchmark that can be run with `make 
>> test TEST="micro:CacheBench"`.
>> Baseline results before any MemoryCache changes:
>> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
>> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
>> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
>> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
>> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
>> there's a nonlinear performance drop between 20480 and 204800 entries, 
>> probably attributable to CPU cache thrashing. Beyond 204800 entries the 
>> cache scales more linearly.
>> 
>> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
>> only copy one:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
>> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
>> The third patch improves worst-case times on a mostly idle cache by 
>> scattering removal of expired entries over multiple `put` calls. It does not 
>> affect performance of an overloaded cache.
>> 
>> The 4th patch removes all code that clears cached values before handing them 
>> over to the GC. [This 
>> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>>  stated that clearing values was supposed to be a GC performance 
>> optimization. It wasn't. Benchmark results after that commit:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
>> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
>> I wasn't expecting that much of an improvement, and don't know how to 
>> explain it.
>> 
>> The 40ns difference between cache with and without a timeout can be 
>> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
>> my VM.
>
> djelinski has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Simplify makefile

If I get the patch right, the benchmark performance improvement is a trade-off 
between CPU and memory, by keeping expired entries while putting a new entry in 
the cache.  I'm not very sure of the performance impact on memory and GC 
collections.  Would you mind add two more types of benchmark: get the entries 
and remove the entries, for cases that there are 1/10, 1/4, 1/3 and 1/2 expired 
entries in the cache?  And increase the size to some big scales, like 2M and 
20M.

It looks like a spec update as it may change the behavior of a few JDK 
components (TLS session cache, OCSP stapling response cache, cert store cache, 
certificate factory, etc), because of "expired entries are no longer guaranteed 
to be removed before live ones".  I'm not very sure of the impact. I may 
suggest to file a CSR and have more eyes to check the compatibility impact 
before moving forward.

-

Changes requested by xuelei (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-01 Thread Xue-Lei Andrew Fan
On Mon, 1 Feb 2021 22:45:17 GMT, Claes Redestad  wrote:

>> Build change looks good, but I would like to hear from @cl4es too.
>
> Adding an `--add-exports` to `JAVAC_FLAGS` is a bit iffy, but should be OK. 
> Yes, all benchmarks will now be compiled with that package exported and 
> visible, but that should have no unintentional effect on other compilations.

The impact could beyond the JSSE implementation, andI will have a look as well.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-01 Thread Claes Redestad
On Mon, 1 Feb 2021 19:22:22 GMT, Erik Joelsson  wrote:

>> djelinski has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Simplify makefile
>
> Build change looks good, but I would like to hear from @cl4es too.

Adding an `--add-exports` to `JAVAC_FLAGS` is a bit iffy, but should be OK. 
Yes, all benchmarks will now be compiled with that package exported and 
visible, but that should have no unintentional effect on other compilations.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-01 Thread Erik Joelsson
On Mon, 1 Feb 2021 18:49:04 GMT, djelinski 
 wrote:

>> Under certain load, MemoryCache operations take a substantial fraction of 
>> the time needed to complete SSL handshakes. This series of patches improves 
>> performance characteristics of MemoryCache, at the cost of a functional 
>> change: expired entries are no longer guaranteed to be removed before live 
>> ones. Unused entries are still removed before used ones, and cache 
>> performance no longer depends on its capacity.
>> 
>> First patch in the series contains a benchmark that can be run with `make 
>> test TEST="micro:CacheBench"`.
>> Baseline results before any MemoryCache changes:
>> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
>> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
>> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
>> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
>> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
>> there's a nonlinear performance drop between 20480 and 204800 entries, 
>> probably attributable to CPU cache thrashing. Beyond 204800 entries the 
>> cache scales more linearly.
>> 
>> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
>> only copy one:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
>> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
>> The third patch improves worst-case times on a mostly idle cache by 
>> scattering removal of expired entries over multiple `put` calls. It does not 
>> affect performance of an overloaded cache.
>> 
>> The 4th patch removes all code that clears cached values before handing them 
>> over to the GC. [This 
>> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>>  stated that clearing values was supposed to be a GC performance 
>> optimization. It wasn't. Benchmark results after that commit:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
>> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
>> I wasn't expecting that much of an improvement, and don't know how to 
>> explain it.
>> 
>> The 40ns difference between cache with and without a timeout can be 
>> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
>> my VM.
>
> djelinski has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Simplify makefile

Build change looks good, but I would like to hear from @cl4es too.

-

Marked as reviewed by erikj (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-01 Thread djelinski
> Under certain load, MemoryCache operations take a substantial fraction of the 
> time needed to complete SSL handshakes. This series of patches improves 
> performance characteristics of MemoryCache, at the cost of a functional 
> change: expired entries are no longer guaranteed to be removed before live 
> ones. Unused entries are still removed before used ones, and cache 
> performance no longer depends on its capacity.
> 
> First patch in the series contains a benchmark that can be run with `make 
> test TEST="micro:CacheBench"`.
> Baseline results before any MemoryCache changes:
> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
> there's a nonlinear performance drop between 20480 and 204800 entries, 
> probably attributable to CPU cache thrashing. Beyond 204800 entries the cache 
> scales more linearly.
> 
> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
> only copy one:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
> The third patch improves worst-case times on a mostly idle cache by 
> scattering removal of expired entries over multiple `put` calls. It does not 
> affect performance of an overloaded cache.
> 
> The 4th patch removes all code that clears cached values before handing them 
> over to the GC. [This 
> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>  stated that clearing values was supposed to be a GC performance 
> optimization. It wasn't. Benchmark results after that commit:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
> I wasn't expecting that much of an improvement, and don't know how to explain 
> it.
> 
> The 40ns difference between cache with and without a timeout can be 
> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
> my VM.

djelinski has updated the pull request incrementally with one additional commit 
since the last revision:

  Simplify makefile

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2255/files
  - new: https://git.openjdk.java.net/jdk/pull/2255/files/5859a032..34949970

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2255.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2255/head:pull/2255

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

2021-02-01 Thread djelinski
On Mon, 1 Feb 2021 18:35:56 GMT, djelinski 
 wrote:

>> Hm, maybe you just misunderstand how this makefile construct works. If you 
>> just want to add "--add-exports java.base/sun.security.util=ALL-UNNAMED", 
>> then that's all you should put in this assignment.
>
> yeah, I'm new to makefiles. Let me try that...

Removed. I could have sworn I tried this... but apparently I didn't. Thanks for 
the suggestion!

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability

2021-02-01 Thread djelinski
On Mon, 1 Feb 2021 18:31:39 GMT, Erik Joelsson  wrote:

>> Adding "--add-exports java.base/sun.security.util=ALL-UNNAMED" is fine, I'm 
>> asking about "$(JAVAC_FLAGS)".
>
> Hm, maybe you just misunderstand how this makefile construct works. If you 
> just want to add "--add-exports java.base/sun.security.util=ALL-UNNAMED", 
> then that's all you should put in this assignment.

yeah, I'm new to makefiles. Let me try that...

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability

2021-02-01 Thread Erik Joelsson
On Mon, 1 Feb 2021 18:24:46 GMT, djelinski 
 wrote:

>> make/test/BuildMicrobenchmark.gmk line 97:
>> 
>>> 95: SRC := $(MICROBENCHMARK_SRC), \
>>> 96: BIN := $(MICROBENCHMARK_CLASSES), \
>>> 97: JAVAC_FLAGS := $(JAVAC_FLAGS) --add-exports 
>>> java.base/sun.security.util=ALL-UNNAMED, \
>> 
>> Why do you need to add $(JAVAC_FLAGS) here?
>
> I'm trying to benchmark a class that is in a non-exported package 
> `sun.security.util`. Without this line the benchmark doesn't compile. I 
> couldn't find any other benchmarks that access non-exported classes, so I 
> came up with my own implementation. 
> 
> Is there a better way to get the benchmark to compile?

Adding "--add-exports java.base/sun.security.util=ALL-UNNAMED" is fine, I'm 
asking about "$(JAVAC_FLAGS)".

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability

2021-02-01 Thread Erik Joelsson
On Mon, 1 Feb 2021 18:30:14 GMT, Erik Joelsson  wrote:

>> I'm trying to benchmark a class that is in a non-exported package 
>> `sun.security.util`. Without this line the benchmark doesn't compile. I 
>> couldn't find any other benchmarks that access non-exported classes, so I 
>> came up with my own implementation. 
>> 
>> Is there a better way to get the benchmark to compile?
>
> Adding "--add-exports java.base/sun.security.util=ALL-UNNAMED" is fine, I'm 
> asking about "$(JAVAC_FLAGS)".

Hm, maybe you just misunderstand how this makefile construct works. If you just 
want to add "--add-exports java.base/sun.security.util=ALL-UNNAMED", then 
that's all you should put in this assignment.

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability

2021-02-01 Thread djelinski
On Mon, 1 Feb 2021 18:18:51 GMT, Erik Joelsson  wrote:

>> Under certain load, MemoryCache operations take a substantial fraction of 
>> the time needed to complete SSL handshakes. This series of patches improves 
>> performance characteristics of MemoryCache, at the cost of a functional 
>> change: expired entries are no longer guaranteed to be removed before live 
>> ones. Unused entries are still removed before used ones, and cache 
>> performance no longer depends on its capacity.
>> 
>> First patch in the series contains a benchmark that can be run with `make 
>> test TEST="micro:CacheBench"`.
>> Baseline results before any MemoryCache changes:
>> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
>> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
>> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
>> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
>> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
>> there's a nonlinear performance drop between 20480 and 204800 entries, 
>> probably attributable to CPU cache thrashing. Beyond 204800 entries the 
>> cache scales more linearly.
>> 
>> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
>> only copy one:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
>> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
>> The third patch improves worst-case times on a mostly idle cache by 
>> scattering removal of expired entries over multiple `put` calls. It does not 
>> affect performance of an overloaded cache.
>> 
>> The 4th patch removes all code that clears cached values before handing them 
>> over to the GC. [This 
>> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>>  stated that clearing values was supposed to be a GC performance 
>> optimization. It wasn't. Benchmark results after that commit:
>> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
>> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
>> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
>> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
>> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
>> I wasn't expecting that much of an improvement, and don't know how to 
>> explain it.
>> 
>> The 40ns difference between cache with and without a timeout can be 
>> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
>> my VM.
>
> make/test/BuildMicrobenchmark.gmk line 97:
> 
>> 95: SRC := $(MICROBENCHMARK_SRC), \
>> 96: BIN := $(MICROBENCHMARK_CLASSES), \
>> 97: JAVAC_FLAGS := $(JAVAC_FLAGS) --add-exports 
>> java.base/sun.security.util=ALL-UNNAMED, \
> 
> Why do you need to add $(JAVAC_FLAGS) here?

I'm trying to benchmark a class that is in a non-exported package 
`sun.security.util`. Without this line the benchmark doesn't compile. I 
couldn't find any other benchmarks that access non-exported classes, so I came 
up with my own implementation. 

Is there a better way to get the benchmark to compile?

-

PR: https://git.openjdk.java.net/jdk/pull/2255


Re: RFR: 8259886 : Improve SSL session cache performance and scalability

2021-02-01 Thread Erik Joelsson
On Wed, 27 Jan 2021 11:32:08 GMT, djelinski 
 wrote:

> Under certain load, MemoryCache operations take a substantial fraction of the 
> time needed to complete SSL handshakes. This series of patches improves 
> performance characteristics of MemoryCache, at the cost of a functional 
> change: expired entries are no longer guaranteed to be removed before live 
> ones. Unused entries are still removed before used ones, and cache 
> performance no longer depends on its capacity.
> 
> First patch in the series contains a benchmark that can be run with `make 
> test TEST="micro:CacheBench"`.
> Baseline results before any MemoryCache changes:
> Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
> CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
> CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
> CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
> CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
> there's a nonlinear performance drop between 20480 and 204800 entries, 
> probably attributable to CPU cache thrashing. Beyond 204800 entries the cache 
> scales more linearly.
> 
> Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
> only copy one:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
> CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
> The third patch improves worst-case times on a mostly idle cache by 
> scattering removal of expired entries over multiple `put` calls. It does not 
> affect performance of an overloaded cache.
> 
> The 4th patch removes all code that clears cached values before handing them 
> over to the GC. [This 
> comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
>  stated that clearing values was supposed to be a GC performance 
> optimization. It wasn't. Benchmark results after that commit:
> Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
> CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
> CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
> CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
> CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
> I wasn't expecting that much of an improvement, and don't know how to explain 
> it.
> 
> The 40ns difference between cache with and without a timeout can be 
> attributed to 2 `System.currentTimeMillis()` calls; they were pretty slow on 
> my VM.

make/test/BuildMicrobenchmark.gmk line 97:

> 95: SRC := $(MICROBENCHMARK_SRC), \
> 96: BIN := $(MICROBENCHMARK_CLASSES), \
> 97: JAVAC_FLAGS := $(JAVAC_FLAGS) --add-exports 
> java.base/sun.security.util=ALL-UNNAMED, \

Why do you need to add $(JAVAC_FLAGS) here?

-

PR: https://git.openjdk.java.net/jdk/pull/2255


RFR: 8259886 : Improve SSL session cache performance and scalability

2021-02-01 Thread djelinski
Under certain load, MemoryCache operations take a substantial fraction of the 
time needed to complete SSL handshakes. This series of patches improves 
performance characteristics of MemoryCache, at the cost of a functional change: 
expired entries are no longer guaranteed to be removed before live ones. Unused 
entries are still removed before used ones, and cache performance no longer 
depends on its capacity.

First patch in the series contains a benchmark that can be run with `make test 
TEST="micro:CacheBench"`.
Baseline results before any MemoryCache changes:
Benchmark   (size)  (timeout)  Mode  Cnt ScoreError  Units
CacheBench.put   20480  86400  avgt   2583.653 ?  6.269  us/op
CacheBench.put   20480  0  avgt   25 0.107 ?  0.001  us/op
CacheBench.put  204800  86400  avgt   25  2057.781 ? 35.942  us/op
CacheBench.put  204800  0  avgt   25 0.108 ?  0.001  us/op
there's a nonlinear performance drop between 20480 and 204800 entries, probably 
attributable to CPU cache thrashing. Beyond 204800 entries the cache scales 
more linearly.

Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll 
only copy one:
Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
CacheBench.put   20480  86400  avgt   25  0.146 ? 0.002  us/op
CacheBench.put   20480  0  avgt   25  0.108 ? 0.002  us/op
CacheBench.put  204800  86400  avgt   25  0.150 ? 0.001  us/op
CacheBench.put  204800  0  avgt   25  0.106 ? 0.001  us/op
The third patch improves worst-case times on a mostly idle cache by scattering 
removal of expired entries over multiple `put` calls. It does not affect 
performance of an overloaded cache.

The 4th patch removes all code that clears cached values before handing them 
over to the GC. [This 
comment](https://github.com/openjdk/jdk/commit/5859a0320334bfb6b46b62eb16b4c387641f4a2a#diff-c6bd583a97fbc4f471621fee7eab37c63718cdb6932ce357fa403cfda4b32b6fL346)
 stated that clearing values was supposed to be a GC performance optimization. 
It wasn't. Benchmark results after that commit:
Benchmark   (size)  (timeout)  Mode  Cnt  Score   Error  Units
CacheBench.put   20480  86400  avgt   25  0.113 ? 0.001  us/op
CacheBench.put   20480  0  avgt   25  0.075 ? 0.002  us/op
CacheBench.put  204800  86400  avgt   25  0.116 ? 0.001  us/op
CacheBench.put  204800  0  avgt   25  0.072 ? 0.001  us/op
I wasn't expecting that much of an improvement, and don't know how to explain 
it.

The 40ns difference between cache with and without a timeout can be attributed 
to 2 `System.currentTimeMillis()` calls; they were pretty slow on my VM.

-

Commit messages:
 - Do not invalidate objects before GC
 - Always expunge on put
 - Stop scanning expired entries after first non-expired one
 - Add cache microbenchmark

Changes: https://git.openjdk.java.net/jdk/pull/2255/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2255&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8259886
  Stats: 138 lines in 3 files changed: 85 ins; 40 del; 13 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2255.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2255/head:pull/2255

PR: https://git.openjdk.java.net/jdk/pull/2255