[jira] [Commented] (GEODE-8278) Gateway sender queues using heap memory way above configured value after server restart

Barrett Oglesby (Jira) Wed, 16 Dec 2020 14:11:42 -0800


    [ 
https://issues.apache.org/jira/browse/GEODE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250671#comment-17250671
 ]


Barrett Oglesby commented on GEODE-8278:
----------------------------------------

I have a test that:

- starts 2 servers each with:
  - persistent data partitioned region
  - persistent parallel gateway sender with maximum queue memory = 75MB
- runs 1 client that loads 5000 entries

The servers are then stopped and restarted.

Histograms after the initial load show:
{noformat}
 num     #instances         #bytes  class name
----------------------------------------------
   1:         16824      103590000  [B
   2:         52167        4980688  [C
  15:          3756         390624  
org.apache.geode.internal.cache.wan.GatewaySenderEventImpl
  19:          5000         320000  
org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1
  22:          5000         280000  
org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey
  40:          5003         120072  
org.apache.geode.internal.cache.VMCachedDeserializable
Total        521194      126837648

 num     #instances         #bytes  class name
----------------------------------------------
   1:         24733      103782464  [B
   2:         51876        4943208  [C
   3:         12151        1365120  java.lang.Class
  15:          3756         390624  
org.apache.geode.internal.cache.wan.GatewaySenderEventImpl
  21:          5000         320000  
org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1
  25:          5000         280000  
org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey
  43:          5001         120024  
org.apache.geode.internal.cache.VMCachedDeserializable
Total        579438      128892808
{noformat}
These show that 1244 GatewaySenderEventImpls have been evicted (5000-3756).

Histograms after recovery show:
{noformat}
 num     #instances         #bytes  class name
----------------------------------------------
   1:          8774      113618992  [B
   2:         47745        4728504  [C
  15:          5000         320000  
org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1
  19:          5000         280000  
org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey
  32:          5003         120072  
org.apache.geode.internal.cache.VMCachedDeserializable
  47:           505          52520  
org.apache.geode.internal.cache.wan.GatewaySenderEventImpl
Total        442436      133431416

 num     #instances         #bytes  class name
----------------------------------------------
   1:         13247      206607704  [B
   2:         47524        4688544  [C
  15:          5000         320000  
org.apache.geode.internal.cache.entries.VersionedThinDiskRegionEntryHeapStringKey1
  18:          5000         280000  
org.apache.geode.internal.cache.entries.VMThinDiskLRURegionEntryHeapLongKey
  22:         10002         240048  
org.apache.geode.internal.cache.VMCachedDeserializable
  47:           505          52520  
org.apache.geode.internal.cache.wan.GatewaySenderEventImpl
Total        447847      226306728
{noformat}
The top histogram shows the GII provider; the bottom histogram shows the GII 
requester.

These show that the GII provider has only recovered keys since there are 5003 
VMCachedDeserializables.

These also show that the GII requester has all the data in memory since there 
are 10002 VMCachedDeserializables.

I added some logging to show the behavior in both the load and recovery cases.

Load case:

Here is a case where no eviction was necessary:
{noformat}
ServerConnection on port 59468 Thread 1: Put65.cmdExecute putting key=55
ServerConnection on port 59468 Thread 1: VMLRURegionMap.lruEntryUpdate 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; 
re=VMThinDiskLRURegionEntryHeapLongKey@2ec5e207 (key=114)
ServerConnection on port 59468 Thread 1: VMLRURegionMap.setDelta 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; lruDelta=20941
ServerConnection on port 59468 Thread 1: VMLRURegionMap.lruUpdateCallback 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; bytesToEvict=20941
ServerConnection on port 59468 Thread 1: VMLRURegionMap.changeTotalEntrySize 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; delta=20941
ServerConnection on port 59468 Thread 1: MemoryLRUStatistics.updateCounter 
delta=20941; total=1172686
ServerConnection on port 59468 Thread 1: Put65.cmdExecute done putting key=55
{noformat}
Here is a case where an entry was evicted:
{noformat}
ServerConnection on port 59468 Thread 2: Put65.cmdExecute putting key=4243
ServerConnection on port 59468 Thread 2: VMLRURegionMap.lruEntryUpdate 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; 
re=VMThinDiskLRURegionEntryHeapLongKey@76ce3670 (key=3850)
ServerConnection on port 59468 Thread 2: VMLRURegionMap.setDelta 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; lruDelta=20943
ServerConnection on port 59468 Thread 2: VMLRURegionMap.lruUpdateCallback 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; bytesToEvict=20943
ServerConnection on port 59468 Thread 2: MemoryLRUStatistics.updateCounter 
delta=-20887; total=78604522
ServerConnection on port 59468 Thread 2: VMLRURegionMap.lruUpdateCallback 
evicted region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; key=3850
ServerConnection on port 59468 Thread 2: VMLRURegionMap.changeTotalEntrySize 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_8; delta=20943
ServerConnection on port 59468 Thread 2: MemoryLRUStatistics.updateCounter 
delta=20943; total=78625465
ServerConnection on port 59468 Thread 2: Put65.cmdExecute done putting key=4243
{noformat}
Recovery case:

The logging looks the same for each entry processed during GII. Notice the two 
calls to VMLRURegionMap.resetThreadLocals and the call to 
VMLRURegionMap.getDelta with value=null. In the recovery case, the delta is not 
being tracked properly so no eviction occurs.
{noformat}
Pooled High Priority Message Processor 14: InitialImageOperation.processChunk 
about to process key=114
Pooled High Priority Message Processor 14: InitialImageOperation.processChunk 
about to initialImagePut key=114; value=VMCachedDeserializable@239524336
Pooled High Priority Message Processor 14: VMLRURegionMap.resetThreadLocals
Pooled High Priority Message Processor 14: InitialImageOperation.processChunk 
done initialImagePut key=114; value=VMCachedDeserializable@239524336
Pooled High Priority Message Processor 14: InitialImageOperation.processChunk 
about to lruUpdateCallback key=114
Pooled High Priority Message Processor 14: VMLRURegionMap.getDelta value=null
Pooled High Priority Message Processor 14: VMLRURegionMap.lruUpdateCallback 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; bytesToEvict=0
Pooled High Priority Message Processor 14: VMLRURegionMap.resetThreadLocals
Pooled High Priority Message Processor 14: VMLRURegionMap.changeTotalEntrySize 
region=/__PR/_B__ny__PARALLEL__GATEWAY__SENDER__QUEUE_1; delta=0
Pooled High Priority Message Processor 14: InitialImageOperation.processChunk 
done lruUpdateCallback key=114
Pooled High Priority Message Processor 14: InitialImageOperation.processChunk 
done processing key=114
{noformat}


> Gateway sender queues using heap memory way above configured value after 
> server restart
> ---------------------------------------------------------------------------------------
>
>                 Key: GEODE-8278
>                 URL: https://issues.apache.org/jira/browse/GEODE-8278
>             Project: Geode
>          Issue Type: Bug
>          Components: eviction
>            Reporter: Alberto Gomez
>            Assignee: Alberto Gomez
>            Priority: Major
>
> In a Geode system with the following characteristics:
>  * WAN replication
>  * partition redundant regions
>  * overflow configured for the gateway senders queues by means of persistence 
> and maximum queue memory set.
>  * gateway receivers stopped in one site (B)
>  * Operations sent to the site that does not have the gateway receivers 
> stopped (A)
> When operations are sent to site A, the gateway sender queues start to grow 
> as expected and the heap memory consumed by the queues does not grow 
> indefinitely given that there is overflow to disk when the limit is reached.
> But, if a server is restarted, the restarted server will show a much higher 
> heap memory used than the memory used by this server before it was restarted 
> or by the other servers.
> This can even provoke that the server cannot be restarted if the heap memory 
> it requires is above the limit configured.
> According to the memory analyzer the entries taking up the memory are 
> subclasses of ```VMThinDiskLRURegionEntryHeap```.
> The number of instances of this type are the same in the restarted server 
> than in the not restarted servers but on the restarted server they take much 
> more memory. The reason seems to be that the ```value``` member attribute of 
> the instances, in the case of the restarted server contains 
> ```VMCachedDeserializable``` objects while in the case of the not restarted 
> server the attribute contains either ```null``` or 
> ```GatewaySenderEventImpl``` objects that use much less memory than the 
> ```VMCachedDeserializable``` ones.
>  If redundancy is not configured for the region then the problem is not 
> manifested, i.e. the heap memory used by the restarted server is similar to 
> the one prior to the restart.
> If the node not restarted is restarted then the previously restarted node 
> seems to release the extra memory (my guess is that it is processing the 
> other process queue).
> Also, if traffic is sent again to the Geode cluster, then it seems eviction 
> kicks in and after some short time, the memory of the restarted server goes 
> down to the level it had before it had been restarted.
> As a summary, the problem seems to be that if a server does GII 
> (getInitialImage) from another server, eviction does not occur for gateway 
> sender queue entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEODE-8278) Gateway sender queues using heap memory way above configured value after server restart

Reply via email to