[ 
https://issues.apache.org/jira/browse/GEODE-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472666#comment-17472666
 ] 

ASF subversion and git services commented on GEODE-9881:
--------------------------------------------------------

Commit c0fbe309ded8e1b53b048ff80a1892eb6a1285ff in geode's branch 
refs/heads/develop from Jakov Varenina
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=c0fbe30 ]

GEODE-9881: Oplog not compacted after recovery (#7193)

* GEODE-9881: Oplog not compacted after recovery

> Fully recoverd Oplogs object indicating unrecoveredRegionCount>0 preventing 
> compaction
> --------------------------------------------------------------------------------------
>
>                 Key: GEODE-9881
>                 URL: https://issues.apache.org/jira/browse/GEODE-9881
>             Project: Geode
>          Issue Type: Bug
>          Components: persistence
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>              Labels: pull-request-available
>
> We have found problem in case when region is closed with Region.close() and 
> then recreated to start the recovery. If you inspect this code in close() 
> function you will notice that it doesn't make any sense:
> {code:java}
>   void close(DiskRegion dr) {
>     // while a krf is being created can not close a region
>     lockCompactor();
>     try {
>       if (!isDrfOnly()) {
>         DiskRegionInfo dri = getDRI(dr);
>         if (dri != null) {
>           long clearCount = dri.clear(null);
>           if (clearCount != 0) {
>             totalLiveCount.addAndGet(-clearCount);
>             // no need to call handleNoLiveValues because we now have an
>             // unrecovered region.
>           }
>           regionMap.get().remove(dr.getId(), dri);
>         }
>         addUnrecoveredRegion(dr.getId());
>       }
>     } finally {
>       unlockCompactor();
>     }
>   }
> {code}
> Please notice that addUnrecoveredRegion() marks DiskRegionInfo object as 
> unrecovered and increments counter unrecoveredRegionCount. This 
> DiskRegionInfo object is contained in regionMap structure. Then afterwards it 
> removes DiskRegionInfo object (that was previously marked as unrecovered) 
> from the regionMap. This doesn't make any sense, it updated object and then 
> removed it from map to be garbage collected. As you will see later on this 
> will cause some issues when region is recovered.
> Please check this code at recovery:
> {code:java}
> /**
>  * For each dri that this oplog has that is currently unrecoverable check to 
> see if a DiskRegion
>  * that is recoverable now exists.
>  */
> void checkForRecoverableRegion(DiskRegionView dr) {
>   if (unrecoveredRegionCount.get() > 0) {
>     DiskRegionInfo dri = getDRI(dr);
>     if (dri != null) {
>       if (dri.testAndSetRecovered(dr)) {
>         unrecoveredRegionCount.decrementAndGet();
>       }
>     }
>   }
> }
> {code}
> The problem is that geode will not clear counter unrecoveredRegionCount in 
> Oplog objects after recovery is done. This is because 
> checkForRecoverableRegion will check unrecoveredRegionCount counter and 
> perform testAndSetRecovered. The testAndSetRecovered will always return 
> false, because non of the DiskRegionInfo objects in region map have 
> unrecovered flag set to true (all object marked as unrecovered were deleted 
> by close(), and then they were recreated during recovery.... see note below). 
> The problem here is that all Oplogs will be fully recovered with the counter 
> incorrectly indicating unrecoveredRegionCount>0. This will later on prevent 
> the compaction of recovered Oplogs (the files that have .crf, .drf and .krf) 
> when they reach compaction threshold.
> Note: During recovery regionMap will be recreated from the Oplog files. Since 
> all DiskRegionInfo objects are deleted from regionMap during the close(), 
> they will be recreated by using function initRecoveredEntry during the 
> recovery. All DiskRegionInfo will be created with flag unrecovered set to 
> false.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to