[
https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakov Varenina updated GEODE-9854:
----------------------------------
Description:
Issue:
An OpLog files are compacted, but the .drf file is left because it contains
deletes ofentries in previous .crfs. The .crf file is deleted, but the orphaned
.drf is not until all
previous .crf files (.crfs with smaller id) are deleted.
The problem is that compacted Oplog object representing orphaned .drf file
holds a structure in memory (Oplog.regionMap) that contains information that is
not useful
after the compaction and it takes certain amount of memory. Besides, there is a
race condition in the code when creating .krf files that, depending on the
execution order,
could make the problem more severe (it could leave elements in the
pendingKrfTags structure on the regionMap and this could take up a significant
amount of memory).
This race condition usually happens when new Oplog is rolled out and previous
Oplog is immediately marked as eligible for compaction. Compaction and .krf
creation start at
the similar time and compactor cancels creation of .krf if it is executed
first. The pendingKrfTags structure is usually cleared when .krf file is
created, but sincecompaction canceled creation of .krf, the pendingKrfTags
structure remain in memory until Oplog representing orphaned .drf file is
deleted.
Below it can be see that actually .krf is never created for the orphaned .drf
Oplog object that has memory allocated in pendingKrfTags:
{code:java}
server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 <Replicate/Partition
Region Garbage Collector> tid=0x34] Created oplog#129 drf for disk store store1.
server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 <Replicate/Partition
Region Garbage Collector> tid=0x34] Created oplog#129 crf for disk store store1.
server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor
store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction
oplog id(s): oplog#129
server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor
store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates
in 41 ms
server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete Task4>
tid=0x5d] Deleted oplog#129 crf for disk store store1.
{code}
!screenshot-1.png|width=1123,height=268!
Below it can be seen that the log and heap dump of orphaned .drf Oplg that have
not pendingKrfTags allocated. This is because pendingKrfTags are deleted when
.krf is created.
{code:java}
server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 <Replicate/Partition
Region Garbage Collector> tid=0x34] Created oplog#130 drf for disk store store1.
server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 <Replicate/Partition
Region Garbage Collector> tid=0x34] Created oplog#130 crf for disk store store1.
server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle
OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor
store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for store1
compaction oplog id(s): oplog#130
server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor
store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918
creates and updates in 64 ms
server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4>
tid=0x5d] Deleted oplog#130 crf for disk store store1.
server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete Task4>
tid=0x5d] Deleted oplog#130 krf for disk store store1.
{code}
was:
Issue:
An OpLog files are compacted, but the .drf file is left because it contains
deletes ofentries in previous .crfs. The .crf file is deleted, but the orphaned
.drf is not until all
previous .crf files (.crfs with smaller id) are deleted.
The problem is that compacted Oplog object representing orphaned .drf file
holds a structure in memory (Oplog.regionMap) that contains information that is
not useful
after the compaction and it takes certain amount of memory. Besides, there is a
race condition in the code when creating .krf files that, depending on the
execution order,
could make the problem more severe (it could leave elements in the
pendingKrfTags structure on the regionMap and this could take up a significant
amount of memory).
This race condition usually happens when new Oplog is rolled out and previous
Oplog is immediately marked as eligible for compaction. Compaction and .krf
creation start at
the similar time and compactor cancels creation of .krf if it is executed
first. The pendingKrfTags structure is usually cleared when .krf file is
created, but sincecompaction canceled creation of .krf, the pendingKrfTags
structure remain in memory until Oplog representing orphaned .drf file is
deleted.
> Orphaned .drf files causing memory leak
> ---------------------------------------
>
> Key: GEODE-9854
> URL: https://issues.apache.org/jira/browse/GEODE-9854
> Project: Geode
> Issue Type: Bug
> Reporter: Jakov Varenina
> Priority: Major
> Attachments: screenshot-1.png, server1.log
>
>
> Issue:
> An OpLog files are compacted, but the .drf file is left because it contains
> deletes ofentries in previous .crfs. The .crf file is deleted, but the
> orphaned .drf is not until all
> previous .crf files (.crfs with smaller id) are deleted.
> The problem is that compacted Oplog object representing orphaned .drf file
> holds a structure in memory (Oplog.regionMap) that contains information that
> is not useful
> after the compaction and it takes certain amount of memory. Besides, there is
> a race condition in the code when creating .krf files that, depending on the
> execution order,
> could make the problem more severe (it could leave elements in the
> pendingKrfTags structure on the regionMap and this could take up a
> significant amount of memory).
> This race condition usually happens when new Oplog is rolled out and previous
> Oplog is immediately marked as eligible for compaction. Compaction and .krf
> creation start at
> the similar time and compactor cancels creation of .krf if it is executed
> first. The pendingKrfTags structure is usually cleared when .krf file is
> created, but sincecompaction canceled creation of .krf, the pendingKrfTags
> structure remain in memory until Oplog representing orphaned .drf file is
> deleted.
> Below it can be see that actually .krf is never created for the orphaned .drf
> Oplog object that has memory allocated in pendingKrfTags:
> {code:java}
> server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129
> drf for disk store store1.
> server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129
> crf for disk store store1.
> server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor
> store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction
> oplog id(s): oplog#129
> server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor
> store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates
> in 41 ms
> server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete
> Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1.
> {code}
> !screenshot-1.png|width=1123,height=268!
> Below it can be seen that the log and heap dump of orphaned .drf Oplg that
> have not pendingKrfTags allocated. This is because pendingKrfTags are deleted
> when .krf is created.
> {code:java}
> server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130
> drf for disk store store1.
> server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1
> <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130
> crf for disk store store1.
> server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle
> OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1.
> server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor
> store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for
> store1 compaction oplog id(s): oplog#130
> server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor
> store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918
> creates and updates in 64 ms
> server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete
> Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1.
> server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete
> Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1.
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)