gnodet commented on PR #22311:
URL: https://github.com/apache/camel/pull/22311#issuecomment-4148963092

   _Claude Code on behalf of Guillaume Nodet_
   
   ## Follow-up: What about fixing `SimpleLRUCache` directly?
   
   ### The root cause
   
   `SimpleLRUCache.remove(Object key)` (line 183) does `delegate.remove(key)` 
but **doesn't decrement `totalChanges`** and **doesn't remove the key's entry 
from `lastChanges`**. So the deque accumulates "ghost" entries — keys that were 
added and then removed from the delegate but still sit in the deque.
   
   When `totalChanges` exceeds `2 * maximumCacheSize` (100,000 for the file 
component), `compressChangesIfNeeded()` fires on every subsequent `put()`. It 
iterates the entire deque, deduplicates by key, but ghost entries for removed 
keys still survive deduplication (they're unique keys). The deque never shrinks 
below the number of distinct ghost keys.
   
   ### What a fix would look like
   
   1. Decrement `totalChanges` in `remove()` to prevent unbounded growth:
   
   ```java
   @Override
   public V remove(Object key) {
       ValueHolder<V> removed = delegate.remove(key);
       if (removed != null) {
           totalChanges.decrementAndGet();
       }
       return extractValue(removed);
   }
   ```
   
   2. Filter out ghost entries in `compressChangesIfNeeded()`:
   
   ```java
   private void compressChangesIfNeeded() {
       if (isQueueFull()) {
           Deque<Entry<K, ValueHolder<V>>> newChanges = new 
ConcurrentLinkedDeque<>();
           Deque<Entry<K, ValueHolder<V>>> currentChanges = 
lastChanges.getAndSet(newChanges);
           Set<K> keys = new HashSet<>();
           Entry<K, ValueHolder<V>> entry;
           while ((entry = currentChanges.pollLast()) != null) {
               // skip ghost entries for keys no longer in the cache
               if (keys.add(entry.getKey()) && 
delegate.containsKey(entry.getKey())) {
                   newChanges.addFirst(entry);
               }
           }
           totalChanges.set(newChanges.size());
       }
   }
   ```
   
   ### Benefits vs. drawbacks
   
   | | Fix `SimpleLRUCache` | This PR's `LinkedHashMap` approach |
   |---|---|---|
   | **Scope** | Fixes all callers (file, AWS S3, IBM COS, any future use) | 
Only fixes the file component |
   | **Complexity** | Small, targeted change in existing code | Adds a new 
factory method + new code path |
   | **LRU semantics** | Preserves LRU eviction (access-order) | Switches to 
insertion-order eviction (FIFO) |
   | **Thread safety** | Already thread-safe (`ConcurrentHashMap` + locks) | 
Relies on `MemoryIdempotentRepository`'s lock; `getCacheSize()`/`doStop()` are 
unguarded |
   | **Risk** | Higher — `SimpleLRUCache` is used broadly, subtle concurrency 
bugs possible | Lower — isolated change, only affects file component default |
   | **Performance** | Slightly heavier (deque + ConcurrentHashMap overhead) | 
Lighter (plain `LinkedHashMap`, no deque) |
   
   ### Recommendation
   
   Ideally do both: fix `SimpleLRUCache` because the ghost entry bug is a real 
defect that will bite other callers (AWS S3 and IBM COS endpoints have the same 
in-progress repository pattern). But also keep the `LinkedHashMap` approach for 
the in-progress repository — it's simpler, lighter, and a better semantic fit 
since in-progress files don't need LRU reordering (they're added, processed, 
then removed). The `SimpleLRUCache` fix could be tracked as a separate JIRA 
ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to