Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

djelinski Tue, 16 Feb 2021 11:42:15 -0800

On Wed, 10 Feb 2021 19:19:33 GMT, djelinski 
<github.com+30433125+djelin...@openjdk.org> wrote:


>>> Thanks for your review! Some comments below.
>>> 
>>> > A full handshake or OCSP status grabbing could counteract all the 
>>> > performance gain with the cache update.
>>> 
>>> Yes, but that's unlikely. Note that K=3 is before K=1 in the queue only 
>>> because 3 wasn't used since 1 was last used. This means that either K=3 is 
>>> used less frequently than K=1, or that all cached items are in active use. 
>>> In the former case we don't lose much by dropping K=3 (granted, there's 
>>> nothing to offset that). In the latter we are dealing with full cache at 
>>> all times, which means that most `put()`s would scan the queue, and we will 
>>> gain a lot by finishing faster.
>> 
>> I may think it differently.  It may be hard to know the future frequency of 
>> an cached item based on the past behaviors.  For the above case, I'm not 
>> sure that K=3 is used less frequently than K=1.  Maybe, next few seconds, 
>> K=1 could be more frequently.
>> 
>> I would like a solution to following the timeout specification: keep the 
>> newer items if possible.
>> 
>>> 
>>> > get() [..] without [..] change the order of the queue
>>> 
>>> If we do that, frequently used entries will be evicted at the same age as 
>>> never used ones. This means we will have to recompute (full handshake/fresh 
>>> OCSP) both the frequently used and the infrequently used entries. It's 
>>> better to recompute only the infrequently used ones, and reuse the 
>>> frequently used as long as possible - we will do less work that way.
>>> That's probably the reason why a `LinkedHashMap` with `accessOrder=true` 
>>> was chosen as the backing store implementation originally.
>>> 
>> 
>> See above.  It may be true for some case to determine the frequency, but 
>> Cache is a general class and we may want to be more careful about if we are 
>> really be able to determine the frequency within the Cache implementation.
>> 
>>> > get() performance is more important [..] so we should try to keep the 
>>> > cache small if possible
>>> 
>>> I don't see the link; could you explain?
>>> 
>> 
>> link? Did you mean the link to get() method?  It is a method in the Cache 
>> class.
>> 
>>> > In the update, the SoftReference.clear() get removed. I'm not sure of the 
>>> > impact of the enqueued objects any longer. In theory, it could improve 
>>> > the memory use, which could counteract the performance gain in some 
>>> > situation.
>>> 
>>> That's the best part: no objects ever get enqueued! We only called 
>>> `clear()` right before losing the last reference to `SoftCacheEntry` (which 
>>> is the `SoftReference`). When GC collects the `SoftReference`, it does not 
>>> enqueue anything. GC only enqueues the `SoftReference` when it collects the 
>>> referenced object (session / OCSP response) without collecting the 
>>> `SoftReference` (cache entry) itself.
>>> This is [documented 
>>> behavior](https://docs.oracle.com/javase/7/docs/api/java/lang/ref/package-summary.html):
>>>  _If a registered reference becomes unreachable itself, then it will never 
>>> be enqueued._
>>> 
>> 
>> I need more time for this section.
>> 
>>> > Could you edit the bug
>>> 
>>> I'd need an account on the bug tracker first.
>> 
>> Okay.  No worries, I will help you if we could get an agreement about the 
>> update.
>
>> I may think it differently. It may be hard to know the future frequency of 
>> an cached item based on the past behaviors. For the above case, I'm not sure 
>> that K=3 is used less frequently than K=1. Maybe, next few seconds, K=1 
>> could be more frequently.
> 
> I agree that such prediction might not be 100% accurate. But, quick google 
> search reveals that there are 
> [many](https://www.usenix.org/system/files/hotstorage20_paper_eytan.pdf) 
> [articles](https://link.springer.com/article/10.1007/PL00009255) that claim 
> that LRU caches offer better hit rates than FIFO, especially for in-memory 
> caches.
>> I would like a solution to following the timeout specification: keep the 
>> newer items if possible.
> 
> That's a trivial change; all we need to do is change `true` to `false` 
> [here](https://github.com/openjdk/jdk/blob/abe0e238bd25adb1ddd2b655613899bfa063cd85/src/java.base/share/classes/sun/security/util/Cache.java#L268).
>  But, as stated above, LRU is better than FIFO, so I wouldn't want to do that.
> 
> I could keep LRU and add another linked list that would store items in the 
> order of their expiration dates; then we could quickly scan that list for 
> expired items. Note: the order of expiration dates is not necessarily the 
> order of insertion, because 1) `System.currentTimeMillis()` is not monotonic 
> - it can move back when something changes the system time, 2) the expiration 
> date is calculated at insertion time, so if someone changes the timeout on a 
> non-empty cache, new items may have shorter expiration time than old ones. 
> So, I'd either need to address that first (change `currentTimeMillis` to 
> `nanoTime` and store creation time instead of expiration time), or use 
> insertion sort for adding items (which would get very slow if either of the 
> above mentioned situations happened).
> Let me know your thoughts.

Well, if removing all expired items before evicting live ones is a 
non-negotiable, implementing all operations in constant time is much easier 
with FIFO, where we only need to keep one item order.
The new commits contain the following changes:
- use `nanoTime` instead of `currentTimeMillis` to make sure that time never 
goes back
- store insertion time instead of expiration time, so that older items always 
expire before newer ones, even when timeout is changed
- change internal hash map to store (and evict) items in insertion (FIFO) order
- always stop scanning entries after finding the first non-expired item, 
because subsequent items are now guaranteed to have later expiration dates, and 
collected soft references are handled by reference queue.

tier1 and jdk_security tests passed; benchmark results show only minimal 
changes. I verified that none of the classes using `Cache` mentions LRU, looks 
like this was an implementation detail.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2255

Re: RFR: 8259886 : Improve SSL session cache performance and scalability [v2]

Reply via email to