Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

djelinski Tue, 02 Feb 2021 04:22:12 -0800

On Tue, 2 Feb 2021 02:37:39 GMT, Xue-Lei Andrew Fan <xue...@openjdk.org> wrote:


> the benchmark performance improvement is a trade-off between CPU and memory, 
> by keeping expired entries while putting a new entry in the cache

Not exactly. The memory use is capped by cache size. The patch is a trade off 
between the cache's hit/miss ratio and CPU; we will get faster cache access at 
the cost of more frequent cache misses.

All calls to `put()` remove expired items from the front of the queue, and 
never perform a full scan. `get()` calls shuffle the queue, moving the accessed 
item to the back. Compare this to original code where `put()` only removed 
expired items when the cache overflowed, and scanned the entire cache.
Let me give some examples. 
**Example 1**: insertions at a fast pace leading to cache overflows and no 
expirations. Here the new implementation improves performance. Consider a cache 
with size=4, timeout=10, and the following sequence of events:
T=1, put(1);
T=2, put(2);
T=3, put(3);
T=4, put(4);
Cache contents after these calls (same in old and new scenario). Queue order: 
least recently accessed items on the left, most recently accessed on the right. 
_K_ denotes cache key, _exp_ denotes entry expiration time and is equal to 
insertion time _T_ plus timeout:

|K=1, exp=11|K=2, exp=12|K=3, exp=13|K=4, exp=14|

If we now add another item to the queue, it will overflow. Here's where the 
implementations behave differently, but the outcome is identical: old one scans 
the entire list for expired entries; new one improves performance by ending the 
search for expired entries after encountering the first non-expired entry 
(which is the first entry in the above example). The end result is the same in 
both cases - oldest (least recently accessed) item is dropped:
T=5, put(5)

|K=2, exp=12|K=3, exp=13|K=4, exp=14|K=5, exp=15|

**Example 2**: insertions at a moderate pace, with interleaved reads. Here the 
new implementation improves performance, but at a possible cost of wasting 
cache capacity on expired entries. Consider a cache with size=4, timeout=7, and 
the following sequence of events:
T=1, put(1);
T=3, put(3);
T=5, put(5);
T=7, put(7);
T=7, get(1);
Cache contents after these calls:

|K=3, exp=10|K=5, exp=12|K=7, exp=14|K=1, exp=8|

`get(1)` operation moved item with K=1 to the back of the queue.

If we wait for item with K=1 to expire and then add another item to the queue, 
it will overflow. Here's where the implementations behave differently, and the 
outcome is different: old one scans the entire list for expired entries, finds 
entry with K=1 and drops it; new one gives up after first non-expired entry 
(which is the first entry), and drops the first entry.

So, when we perform:
T=9, put(9);

Old implementation will get:
|K=3, exp=10|K=5, exp=12|K=7, exp=14|K=9, exp=16|

New implementation will get:
|K=5, exp=12|K=7, exp=14|K=1, exp=8(expired)|K=9, exp=16|

Note that:
- an attempt to retrieve expired item (i.e. `get(1)`) will immediately remove 
that item from cache, making room for other items
- retrieving a non-expired item will move it to the back of the queue, behind 
all expired items

**Example 3**: insertions at a slow pace, where most items expire before queue 
overflows. Here the new implementation improves memory consumption. Consider a 
cache with size=4, timeout=1, and the following sequence of events:
T=1, put(1);
T=3, put(3);
T=5, put(5);
T=7, put(7);
Every cache item is expired at then point when a new one is added. Old 
implementation only removes expired entries when cache overflows, so all 
entries will still be there:

|K=1, exp=2(expired)|K=3, exp=4(expired)|K=5, exp=6(expired)|K=7, exp=8|

New implementation removes expired entries on every put, so after the last put 
only one entry is in the cache:

|K=7, exp=8|

After another put the old implementation will encounter a cache overflow and 
remove all expired items.

Let me know if that helped.

> add two more types of benchmark: get the entries and remove the entries

Both these operations are constant-time, both before and after my changes. Do 
you expect to see some oddities here, or do we just want a benchmark that could 
be used to compare other implementations?

> increase the size to some big scales, like 2M and 20M

Can do. Do you think it makes sense to also benchmark the scenario where GC 
kicks in and collects soft references?

> it may change the behavior of a few JDK components

Of all uses of Cache, only `SSLSessionContextImpl` (TLS session cache), 
`StatusResponseManager` (OCSP stapling) and `LDAPCertStoreImpl` (I'm not 
familiar with that one) set expiration timeout; when the timeout is not set, 
the behavior is exactly the same as before.
`StatusResponseManager` is constantly querying the same keys, and is liberally 
sized, so I don't expect much of an impact.
TLS session cache changes may result in fewer session resumptions and more full 
handshakes; I expect the cache performance improvement to more than offset the 
CPU cycles lost on full handshakes.

How do I file a CSR?

Also, what do you think about the changes done in `Do not invalidate objects 
before GC` 5859a03 commit? They offer a minor performance improvement, but if 
clearing the values before GC is an important security feature of this cache, 
I'm prepared to drop that commit.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2255

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

Reply via email to