[ 
https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darrel Schneider reassigned GEODE-9854:
---------------------------------------

    Assignee: Darrel Schneider

> Orphaned .drf files causing memory leak
> ---------------------------------------
>
>                 Key: GEODE-9854
>                 URL: https://issues.apache.org/jira/browse/GEODE-9854
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Assignee: Darrel Schneider
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: screenshot-1.png, screenshot-2.png, server1.log
>
>
> Issue:
> An OpLog files are compacted, but the .drf file is left because it contains 
> deletes ofentries in previous .crfs. The .crf file is deleted, but the 
> orphaned .drf is not until all
> previous .crf files (.crfs with smaller id) are deleted.
> The problem is that compacted Oplog object representing orphaned .drf file 
> holds a structure in memory (Oplog.regionMap) that contains information that 
> is not useful
> after the compaction and it takes certain amount of memory. Besides, there is 
> a race condition in the code when creating .krf files that, depending on the 
> execution order,
> could make the problem more severe  (it could leave pendingKrfTags structure 
> on the regionMap and this could take up a significant amount of memory). This
> pendingKrfTags HashMap is actually empty, but consumes memory because it was 
> used previously and the size of the HashMap was not reduced after it is 
> cleared.
> This race condition usually happens when new Oplog is rolled out and previous 
> Oplog is immediately marked as eligible for compaction. Compaction and .krf 
> creation start at
> the similar time and compactor cancels creation of .krf if it is executed 
> first. The pendingKrfTags structure is usually cleared when .krf file is 
> created, but sincecompaction canceled creation of .krf, the pendingKrfTags 
> structure remain in memory until Oplog representing orphaned .drf file is 
> deleted.
> Below it can be see that actually .krf is never created for the orphaned .drf 
> Oplog object that has memory allocated in pendingKrfTags:
> {code:java}
> server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 
> drf for disk store store1.
> server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 
> crf for disk store store1.
> server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor 
> store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction 
> oplog id(s): oplog#129
> server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor 
> store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates 
> in 41 ms
> server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete 
> Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1.
> {code}
> !screenshot-1.png|width=1123,height=268!
> Below you can see the log and heap dump of orphaned .drf Oplg that dont have 
> pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared 
> when .krf is created as can be seen in below logs.
> {code:java}
> server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 
> drf for disk store store1.
> server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 
> crf for disk store store1.
> server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle 
> OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
> server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor 
> store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for 
> store1 compaction oplog id(s): oplog#130
> server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor 
> store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 
> creates and updates in 64 ms
> server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete 
> Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1.
> server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete 
> Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1.
> {code}
> !screenshot-2.png|width=1123,height=268!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to