Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

djelinski Thu, 04 Feb 2021 11:20:06 -0800

On Tue, 2 Feb 2021 12:19:22 GMT, djelinski 
<github.com+30433125+djelin...@openjdk.org> wrote:


>> If I get the patch right, the benchmark performance improvement is a 
>> trade-off between CPU and memory, by keeping expired entries while putting a 
>> new entry in the cache.  I'm not very sure of the performance impact on 
>> memory and GC collections.  Would you mind add two more types of benchmark: 
>> get the entries and remove the entries, for cases that there are 1/10, 1/4, 
>> 1/3 and 1/2 expired entries in the cache?  And increase the size to some big 
>> scales, like 2M and 20M.
>> 
>> It looks like a spec update as it may change the behavior of a few JDK 
>> components (TLS session cache, OCSP stapling response cache, cert store 
>> cache, certificate factory, etc), because of "expired entries are no longer 
>> guaranteed to be removed before live ones".  I'm not very sure of the 
>> impact. I may suggest to file a CSR and have more eyes to check the 
>> compatibility impact before moving forward.
>
>> the benchmark performance improvement is a trade-off between CPU and memory, 
>> by keeping expired entries while putting a new entry in the cache
> 
> Not exactly. The memory use is capped by cache size. The patch is a trade off 
> between the cache's hit/miss ratio and CPU; we will get faster cache access 
> at the cost of more frequent cache misses.
> 
> All calls to `put()` remove expired items from the front of the queue, and 
> never perform a full scan. `get()` calls shuffle the queue, moving the 
> accessed item to the back. Compare this to original code where `put()` only 
> removed expired items when the cache overflowed, and scanned the entire cache.
> Let me give some examples. 
> **Example 1**: insertions at a fast pace leading to cache overflows and no 
> expirations. Here the new implementation improves performance. Consider a 
> cache with size=4, timeout=10, and the following sequence of events:
> T=1, put(1);
> T=2, put(2);
> T=3, put(3);
> T=4, put(4);
> Cache contents after these calls (same in old and new scenario). Queue order: 
> least recently accessed items on the left, most recently accessed on the 
> right. _K_ denotes cache key, _exp_ denotes entry expiration time and is 
> equal to insertion time _T_ plus timeout:
> 
> |K=1, exp=11|K=2, exp=12|K=3, exp=13|K=4, exp=14|
> 
> If we now add another item to the queue, it will overflow. Here's where the 
> implementations behave differently, but the outcome is identical: old one 
> scans the entire list for expired entries; new one improves performance by 
> ending the search for expired entries after encountering the first 
> non-expired entry (which is the first entry in the above example). The end 
> result is the same in both cases - oldest (least recently accessed) item is 
> dropped:
> T=5, put(5)
> 
> |K=2, exp=12|K=3, exp=13|K=4, exp=14|K=5, exp=15|
> 
> **Example 2**: insertions at a moderate pace, with interleaved reads. Here 
> the new implementation improves performance, but at a possible cost of 
> wasting cache capacity on expired entries. Consider a cache with size=4, 
> timeout=7, and the following sequence of events:
> T=1, put(1);
> T=3, put(3);
> T=5, put(5);
> T=7, put(7);
> T=7, get(1);
> Cache contents after these calls:
> 
> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=1, exp=8|
> 
> `get(1)` operation moved item with K=1 to the back of the queue.
> 
> If we wait for item with K=1 to expire and then add another item to the 
> queue, it will overflow. Here's where the implementations behave differently, 
> and the outcome is different: old one scans the entire list for expired 
> entries, finds entry with K=1 and drops it; new one gives up after first 
> non-expired entry (which is the first entry), and drops the first entry.
> 
> So, when we perform:
> T=9, put(9);
> 
> Old implementation will get:
> |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16|
> 
> New implementation will get:
> |K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16|
> 
> Note that:
> - an attempt to retrieve expired item (i.e. `get(1)`) will immediately remove 
> that item from cache, making room for other items
> - retrieving a non-expired item will move it to the back of the queue, behind 
> all expired items
> 
> **Example 3**: insertions at a slow pace, where most items expire before 
> queue overflows. Here the new implementation improves memory consumption. 
> Consider a cache with size=4, timeout=1, and the following sequence of events:
> T=1, put(1);
> T=3, put(3);
> T=5, put(5);
> T=7, put(7);
> Every cache item is expired at then point when a new one is added. Old 
> implementation only removes expired entries when cache overflows, so all 
> entries will still be there:
> 
> |K=1, exp=2(expired)|K=3, exp=4(expired)|K=5, exp=6(expired)|K=7, exp=8|
> 
> New implementation removes expired entries on every put, so after the last 
> put only one entry is in the cache:
> 
> |K=7, exp=8|
> 
> After another put the old implementation will encounter a cache overflow and 
> remove all expired items.
> 
> Let me know if that helped.
> 
>> add two more types of benchmark: get the entries and remove the entries
> 
> Both these operations are constant-time, both before and after my changes. Do 
> you expect to see some oddities here, or do we just want a benchmark that 
> could be used to compare other implementations?
> 
>> increase the size to some big scales, like 2M and 20M
> 
> Can do. Do you think it makes sense to also benchmark the scenario where GC 
> kicks in and collects soft references?
> 
>> it may change the behavior of a few JDK components
> 
> Of all uses of Cache, only `SSLSessionContextImpl` (TLS session cache), 
> `StatusResponseManager` (OCSP stapling) and `LDAPCertStoreImpl` (I'm not 
> familiar with that one) set expiration timeout; when the timeout is not set, 
> the behavior is exactly the same as before.
> `StatusResponseManager` is constantly querying the same keys, and is 
> liberally sized, so I don't expect much of an impact.
> TLS session cache changes may result in fewer session resumptions and more 
> full handshakes; I expect the cache performance improvement to more than 
> offset the CPU cycles lost on full handshakes.
> 
> How do I file a CSR?
> 
> Also, what do you think about the changes done in `Do not invalidate objects 
> before GC` 5859a03 commit? They offer a minor performance improvement, but if 
> clearing the values before GC is an important security feature of this cache, 
> I'm prepared to drop that commit.

Added benchmarks for get & remove. Added tests for 5M cache size. Switched time 
units to nanoseconds. Results:
Benchmark           (size)  (timeout)  Mode  Cnt    Score   Error  Units
CacheBench.get       20480      86400  avgt   25   62.999 ? 2.017  ns/op
CacheBench.get       20480          0  avgt   25   41.519 ? 1.113  ns/op
CacheBench.get      204800      86400  avgt   25   67.995 ? 4.530  ns/op
CacheBench.get      204800          0  avgt   25   46.439 ? 2.222  ns/op
CacheBench.get     5120000      86400  avgt   25   72.516 ? 0.759  ns/op
CacheBench.get     5120000          0  avgt   25   53.471 ? 0.491  ns/op
CacheBench.put       20480      86400  avgt   25  117.117 ? 3.424  ns/op
CacheBench.put       20480          0  avgt   25   73.582 ? 1.484  ns/op
CacheBench.put      204800      86400  avgt   25  116.983 ? 0.743  ns/op
CacheBench.put      204800          0  avgt   25   73.945 ? 0.515  ns/op
CacheBench.put     5120000      86400  avgt   25  230.878 ? 7.582  ns/op
CacheBench.put     5120000          0  avgt   25  192.526 ? 7.048  ns/op
CacheBench.remove    20480      86400  avgt   25   39.048 ? 2.036  ns/op
CacheBench.remove    20480          0  avgt   25   36.293 ? 0.281  ns/op
CacheBench.remove   204800      86400  avgt   25   43.899 ? 0.895  ns/op
CacheBench.remove   204800          0  avgt   25   43.046 ? 0.759  ns/op
CacheBench.remove  5120000      86400  avgt   25   51.896 ? 0.640  ns/op
CacheBench.remove  5120000          0  avgt   25   51.537 ? 0.536  ns/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/2255

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

Reply via email to