[ https://issues.apache.org/jira/browse/GEODE-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols closed GEODE-9854. ------------------------------- > Orphaned .drf files causing memory leak > --------------------------------------- > > Key: GEODE-9854 > URL: https://issues.apache.org/jira/browse/GEODE-9854 > Project: Geode > Issue Type: Bug > Reporter: Jakov Varenina > Assignee: Jakov Varenina > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > Attachments: screenshot-1.png, screenshot-2.png, server1.log > > > Issue: > An OpLog files are compacted, but the .drf file is left because it contains > deletes ofentries in previous .crfs. The .crf file is deleted, but the > orphaned .drf is not until all > previous .crf files (.crfs with smaller id) are deleted. > The problem is that compacted Oplog object representing orphaned .drf file > holds a structure in memory (Oplog.regionMap) that contains information that > is not useful > after the compaction and it takes certain amount of memory. Besides, there is > a race condition in the code when creating .krf files that, depending on the > execution order, > could make the problem more severe (it could leave pendingKrfTags structure > on the regionMap and this could take up a significant amount of memory). This > pendingKrfTags HashMap is actually empty, but consumes memory because it was > used previously and the size of the HashMap was not reduced after it is > cleared. > This race condition usually happens when new Oplog is rolled out and previous > Oplog is immediately marked as eligible for compaction. Compaction and .krf > creation start at > the similar time and compactor cancels creation of .krf if it is executed > first. The pendingKrfTags structure is usually cleared when .krf file is > created, but sincecompaction canceled creation of .krf, the pendingKrfTags > structure remain in memory until Oplog representing orphaned .drf file is > deleted. > Below it can be see that actually .krf is never created for the orphaned .drf > Oplog object that has memory allocated in pendingKrfTags: > {code:java} > server1.log:1956:[info 2021/11/25 21:52:26.866 CET server1 > <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 > drf for disk store store1. > server1.log:1958:[info 2021/11/25 21:52:26.867 CET server1 > <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#129 > crf for disk store store1. > server1.log:1974:[info 2021/11/25 21:52:39.490 CET server1 <OplogCompactor > store1 for oplog oplog#129> tid=0x5c] OplogCompactor for store1 compaction > oplog id(s): oplog#129 > server1.log:1980:[info 2021/11/25 21:52:39.532 CET server1 <OplogCompactor > store1 for oplog oplog#129> tid=0x5c] compaction did 3685 creates and updates > in 41 ms > server1.log:1982:[info 2021/11/25 21:52:39.532 CET server1 <Oplog Delete > Task4> tid=0x5d] Deleted oplog#129 crf for disk store store1. > {code} > !screenshot-1.png|width=1123,height=268! > Below you can see the log and heap dump of orphaned .drf Oplg that dont have > pendingKrfTags allocated in memory. This is because pendingKrfTags is cleared > when .krf is created as can be seen in below logs. > {code:java} > server1.log:1976:[info 2021/11/25 21:52:39.491 CET server1 > <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 > drf for disk store store1. > server1.log:1978:[info 2021/11/25 21:52:39.493 CET server1 > <Replicate/Partition Region Garbage Collector> tid=0x34] Created oplog#130 > crf for disk store store1. > server1.log:1998:[info 2021/11/25 21:52:41.131 CET server1 <Idle > OplogCompactor> tid=0x5c] Created oplog#130 krf for disk store store1. > server1.log:2000:[info 2021/11/25 21:52:41.893 CET server1 <OplogCompactor > store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] OplogCompactor for > store1 compaction oplog id(s): oplog#130 > server1.log:2002:[info 2021/11/25 21:52:41.958 CET server1 <OplogCompactor > store1 for oplog oplog#130> tid=0x5c|#130> tid=0x5c] compaction did 9918 > creates and updates in 64 ms > server1.log:2004:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete > Task4> tid=0x5d] Deleted oplog#130 crf for disk store store1. > server1.log:2006:[info 2021/11/25 21:52:41.958 CET server1 <Oplog Delete > Task4> tid=0x5d] Deleted oplog#130 krf for disk store store1. > {code} > !screenshot-2.png|width=1123,height=268! -- This message was sent by Atlassian Jira (v8.20.7#820007)