[ 
https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-9854:
----------------------------------
    Description: 
Issue:
An OpLog files are compacted, but the .drf file is left because it contains 
deletes ofentries in previous .crfs. The .crf file is deleted, but the orphaned 
.drf is not until all
previous .crf files (.crfs with smaller id) are deleted.

The problem is that compacted Oplog object representing orphaned .drf file 
holds a structure in memory (Oplog.regionMap) that contains information that is 
not useful
after the compaction and it takes certain amount of memory. Besides, there is a 
race condition in the code when creating .krf files that, depending on the 
execution order,
could make the problem more severe (it could leave elements in the 
pendingKrfTags structure on the regionMap and this could take up a significant 
amount of memory).
This race condition usually happens when new Oplog is rolled out and previous 
Oplog is immediately marked as eligible for compaction. Compaction and .krf 
creation start at
the similar time and compactor cancels creation of .krf if it is executed 
first. The pendingKrfTags structure is usually cleared when .krf file is 
created, but sincecompaction canceled creation of .krf, the pendingKrfTags 
structure remain in memory until Oplog representing orphaned .drf file is 
deleted.

Below it can be see that actually .krf is never created for the orphaned .drf 
Oplog object that has memory allocated in pendingKrfTags:
{code:java}
server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 <Replicate/Partition 
Region Garbage Collector> tid=0x34] Created oplog#129 drf for disk store store1.
server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 <Replicate/Partition 
Region Garbage Collector> tid=0x34] Created oplog#129 crf for disk store store1.
server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor 
store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction 
oplog id(s): oplog#129
server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor 
store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates 
in 41 ms
server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete Task4> 
tid=0x5d] Deleted oplog#129 crf for disk store store1.
{code}

!screenshot-1.png|width=1123,height=268! 


Below it can be seen that the log and heap dump of orphaned .drf Oplg that have 
not pendingKrfTags allocated. This is because pendingKrfTags are deleted when 
.krf is created.


{code:java}
server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 <Replicate/Partition 
Region Garbage Collector> tid=0x34] Created oplog#130 drf for disk store store1.
server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 <Replicate/Partition 
Region Garbage Collector> tid=0x34] Created oplog#130 crf for disk store store1.
server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle 
OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor 
store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for store1 
compaction oplog id(s): oplog#130
server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor 
store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 
creates and updates in 64 ms
server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4> 
tid=0x5d] Deleted oplog#130 crf for disk store store1.
server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4> 
tid=0x5d] Deleted oplog#130 krf for disk store store1.
{code}


  was:
Issue:
An OpLog files are compacted, but the .drf file is left because it contains 
deletes ofentries in previous .crfs. The .crf file is deleted, but the orphaned 
.drf is not until all
previous .crf files (.crfs with smaller id) are deleted.

The problem is that compacted Oplog object representing orphaned .drf file 
holds a structure in memory (Oplog.regionMap) that contains information that is 
not useful
after the compaction and it takes certain amount of memory. Besides, there is a 
race condition in the code when creating .krf files that, depending on the 
execution order,
could make the problem more severe (it could leave elements in the 
pendingKrfTags structure on the regionMap and this could take up a significant 
amount of memory).
This race condition usually happens when new Oplog is rolled out and previous 
Oplog is immediately marked as eligible for compaction. Compaction and .krf 
creation start at
the similar time and compactor cancels creation of .krf if it is executed 
first. The pendingKrfTags structure is usually cleared when .krf file is 
created, but sincecompaction canceled creation of .krf, the pendingKrfTags 
structure remain in memory until Oplog representing orphaned .drf file is 
deleted.


> Orphaned .drf files causing memory leak
> ---------------------------------------
>
>                 Key: GEODE-9854
>                 URL: https://issues.apache.org/jira/browse/GEODE-9854
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Priority: Major
>         Attachments: screenshot-1.png, server1.log
>
>
> Issue:
> An OpLog files are compacted, but the .drf file is left because it contains 
> deletes ofentries in previous .crfs. The .crf file is deleted, but the 
> orphaned .drf is not until all
> previous .crf files (.crfs with smaller id) are deleted.
> The problem is that compacted Oplog object representing orphaned .drf file 
> holds a structure in memory (Oplog.regionMap) that contains information that 
> is not useful
> after the compaction and it takes certain amount of memory. Besides, there is 
> a race condition in the code when creating .krf files that, depending on the 
> execution order,
> could make the problem more severe (it could leave elements in the 
> pendingKrfTags structure on the regionMap and this could take up a 
> significant amount of memory).
> This race condition usually happens when new Oplog is rolled out and previous 
> Oplog is immediately marked as eligible for compaction. Compaction and .krf 
> creation start at
> the similar time and compactor cancels creation of .krf if it is executed 
> first. The pendingKrfTags structure is usually cleared when .krf file is 
> created, but sincecompaction canceled creation of .krf, the pendingKrfTags 
> structure remain in memory until Oplog representing orphaned .drf file is 
> deleted.
> Below it can be see that actually .krf is never created for the orphaned .drf 
> Oplog object that has memory allocated in pendingKrfTags:
> {code:java}
> server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 
> drf for disk store store1.
> server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 
> crf for disk store store1.
> server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor 
> store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction 
> oplog id(s): oplog#129
> server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor 
> store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates 
> in 41 ms
> server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete 
> Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1.
> {code}
> !screenshot-1.png|width=1123,height=268! 
> Below it can be seen that the log and heap dump of orphaned .drf Oplg that 
> have not pendingKrfTags allocated. This is because pendingKrfTags are deleted 
> when .krf is created.
> {code:java}
> server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 
> drf for disk store store1.
> server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 
> crf for disk store store1.
> server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle 
> OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
> server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor 
> store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for 
> store1 compaction oplog id(s): oplog#130
> server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor 
> store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 
> creates and updates in 64 ms
> server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete 
> Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1.
> server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete 
> Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to