[ https://issues.apache.org/jira/browse/GEODE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250671#comment-17250671 ]
Barrett Oglesby commented on GEODE-8278: ---------------------------------------- I have a test that: - starts 2 servers each with: - persistent data partitioned region - persistent parallel gateway sender with maximum queue memory = 75MB - runs 1 client that loads 5000 entries The servers are then stopped and restarted. Histograms after the initial load show: {noformat} num #instances #bytes class name ---------------------------------------------- 1: 16824 103590000 [B 2: 52167 4980688 [C 15: 3756 390624 org.apache.geode.internal.cache.wan.GatewaySenderEventImpl 19: 5000 320000 org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1 22: 5000 280000 org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey 40: 5003 120072 org.apache.geode.internal.cache.VMCachedDeserializable Total 521194 126837648 num #instances #bytes class name ---------------------------------------------- 1: 24733 103782464 [B 2: 51876 4943208 [C 3: 12151 1365120 java.lang.Class 15: 3756 390624 org.apache.geode.internal.cache.wan.GatewaySenderEventImpl 21: 5000 320000 org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1 25: 5000 280000 org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey 43: 5001 120024 org.apache.geode.internal.cache.VMCachedDeserializable Total 579438 128892808 {noformat} These show that 1244 GatewaySenderEventImpls have been evicted (5000-3756). Histograms after recovery show: {noformat} num #instances #bytes class name ---------------------------------------------- 1: 8774 113618992 [B 2: 47745 4728504 [C 15: 5000 320000 org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1 19: 5000 280000 org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey 32: 5003 120072 org.apache.geode.internal.cache.VMCachedDeserializable 47: 505 52520 org.apache.geode.internal.cache.wan.GatewaySenderEventImpl Total 442436 133431416 num #instances #bytes class name ---------------------------------------------- 1: 13247 206607704 [B 2: 47524 4688544 [C 15: 5000 320000 org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1 18: 5000 280000 org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey 22: 10002 240048 org.apache.geode.internal.cache.VMCachedDeserializable 47: 505 52520 org.apache.geode.internal.cache.wan.GatewaySenderEventImpl Total 447847 226306728 {noformat} The top histogram shows the GII provider; the bottom histogram shows the GII requester. These show that the GII provider has only recovered keys since there are 5003 VMCachedDeserializables. These also show that the GII requester has all the data in memory since there are 10002 VMCachedDeserializables. I added some logging to show the behavior in both the load and recovery cases. Load case: Here is a case where no eviction was necessary: {noformat} ServerConnection on port 59468 Thread 1: Put65.cmdExecute putting key=55 ServerConnection on port 59468 Thread 1: VMLRURegionMap.lruEntryUpdate region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; re=VMThinDiskLRURegionEntryHeapLongKey@2ec5e207 (key=114) ServerConnection on port 59468 Thread 1: VMLRURegionMap.setDelta region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; lruDelta=20941 ServerConnection on port 59468 Thread 1: VMLRURegionMap.lruUpdateCallback region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; bytesToEvict=20941 ServerConnection on port 59468 Thread 1: VMLRURegionMap.changeTotalEntrySize region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; delta=20941 ServerConnection on port 59468 Thread 1: MemoryLRUStatistics.updateCounter delta=20941; total=1172686 ServerConnection on port 59468 Thread 1: Put65.cmdExecute done putting key=55 {noformat} Here is a case where an entry was evicted: {noformat} ServerConnection on port 59468 Thread 2: Put65.cmdExecute putting key=4243 ServerConnection on port 59468 Thread 2: VMLRURegionMap.lruEntryUpdate region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; re=VMThinDiskLRURegionEntryHeapLongKey@76ce3670 (key=3850) ServerConnection on port 59468 Thread 2: VMLRURegionMap.setDelta region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; lruDelta=20943 ServerConnection on port 59468 Thread 2: VMLRURegionMap.lruUpdateCallback region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; bytesToEvict=20943 ServerConnection on port 59468 Thread 2: MemoryLRUStatistics.updateCounter delta=-20887; total=78604522 ServerConnection on port 59468 Thread 2: VMLRURegionMap.lruUpdateCallback evicted region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; key=3850 ServerConnection on port 59468 Thread 2: VMLRURegionMap.changeTotalEntrySize region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; delta=20943 ServerConnection on port 59468 Thread 2: MemoryLRUStatistics.updateCounter delta=20943; total=78625465 ServerConnection on port 59468 Thread 2: Put65.cmdExecute done putting key=4243 {noformat} Recovery case: The logging looks the same for each entry processed during GII. Notice the two calls to VMLRURegionMap.resetThreadLocals and the call to VMLRURegionMap.getDelta with value=null. In the recovery case, the delta is not being tracked properly so no eviction occurs. {noformat} Pooled High Priority Message Processor 14: InitialImageOperation.processChunk about to process key=114 Pooled High Priority Message Processor 14: InitialImageOperation.processChunk about to initialImagePut key=114; value=VMCachedDeserializable@239524336 Pooled High Priority Message Processor 14: VMLRURegionMap.resetThreadLocals Pooled High Priority Message Processor 14: InitialImageOperation.processChunk done initialImagePut key=114; value=VMCachedDeserializable@239524336 Pooled High Priority Message Processor 14: InitialImageOperation.processChunk about to lruUpdateCallback key=114 Pooled High Priority Message Processor 14: VMLRURegionMap.getDelta value=null Pooled High Priority Message Processor 14: VMLRURegionMap.lruUpdateCallback region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; bytesToEvict=0 Pooled High Priority Message Processor 14: VMLRURegionMap.resetThreadLocals Pooled High Priority Message Processor 14: VMLRURegionMap.changeTotalEntrySize region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; delta=0 Pooled High Priority Message Processor 14: InitialImageOperation.processChunk done lruUpdateCallback key=114 Pooled High Priority Message Processor 14: InitialImageOperation.processChunk done processing key=114 {noformat} > Gateway sender queues using heap memory way above configured value after > server restart > --------------------------------------------------------------------------------------- > > Key: GEODE-8278 > URL: https://issues.apache.org/jira/browse/GEODE-8278 > Project: Geode > Issue Type: Bug > Components: eviction > Reporter: Alberto Gomez > Assignee: Alberto Gomez > Priority: Major > > In a Geode system with the following characteristics: > * WAN replication > * partition redundant regions > * overflow configured for the gateway senders queues by means of persistence > and maximum queue memory set. > * gateway receivers stopped in one site (B) > * Operations sent to the site that does not have the gateway receivers > stopped (A) > When operations are sent to site A, the gateway sender queues start to grow > as expected and the heap memory consumed by the queues does not grow > indefinitely given that there is overflow to disk when the limit is reached. > But, if a server is restarted, the restarted server will show a much higher > heap memory used than the memory used by this server before it was restarted > or by the other servers. > This can even provoke that the server cannot be restarted if the heap memory > it requires is above the limit configured. > According to the memory analyzer the entries taking up the memory are > subclasses of ```VMThinDiskLRURegionEntryHeap```. > The number of instances of this type are the same in the restarted server > than in the not restarted servers but on the restarted server they take much > more memory. The reason seems to be that the ```value``` member attribute of > the instances, in the case of the restarted server contains > ```VMCachedDeserializable``` objects while in the case of the not restarted > server the attribute contains either ```null``` or > ```GatewaySenderEventImpl``` objects that use much less memory than the > ```VMCachedDeserializable``` ones. > If redundancy is not configured for the region then the problem is not > manifested, i.e. the heap memory used by the restarted server is similar to > the one prior to the restart. > If the node not restarted is restarted then the previously restarted node > seems to release the extra memory (my guess is that it is processing the > other process queue). > Also, if traffic is sent again to the Geode cluster, then it seems eviction > kicks in and after some short time, the memory of the restarted server goes > down to the level it had before it had been restarted. > As a summary, the problem seems to be that if a server does GII > (getInitialImage) from another server, eviction does not occur for gateway > sender queue entries. -- This message was sent by Atlassian Jira (v8.3.4#803005)