I think that #1 is a good start...

I think we should have all regions created before we recover from the files, or at least for the regions that use the diskstore being recovered.

When we recover, do we start recovering in reverse? I mean, the latest data *should* be in the last data files, so recovering in reverse might get you the latest data first.

In addition to this, could we not treat a restart as a "compaction"? Add data to the established region and insert data into the region. If that region needs to overflow, then the data overflows to a "new" clean file set. So in the end, when the system has restarted, the previous data store files can be deleted.

I hope that it makes sense...

On 1/17/17 14:41, Anilkumar Gingade wrote:
Hi Geode Devs,

We are working on ticket GEODE-1672, related to out of memory during
recovery with overflow regions (heap LRU configured).

https://issues.apache.org/jira/browse/GEODE-1672

When recovering the persistent files, GEODE stores the values into temp
maps (for regions) using a background thread, as these maps are not actual
regions,  these are not considered/included for LRU eviction, which causes
the system to run OOM.

We are thinking about following approaches to address this issue...Let us
know if you have any comments/suggestion about the solutions.

1. Skip recovering the regions marked with LRU eviction.
- This keeps the code changes to minimal.
- Accessing the most recently used values first time, will be expensive.
But this is true even if the values are recovered, as Geode doesn't
guarantee the recently/most used values will be in memory after recovery.
- This may impact the use-cases where regions are set with LRU eviction,
even though there is no  memory pressure (system configured to handle
unexpected events)

2. Include temp maps (these are AbstractRegionMap) for eviction during
recovery.
- May involve lots of code change. The size estimation code in bucket
regions need to be moved to AbstractRegionMap.
- Need to handle the rate of recovery thread to throttle based on the
eviction rate, which could impact the recovery of regions without eviction.
We can think of overriding the default eviction rate during recovery...
- The regions will be in the similar state (number of entries), when system
is recovered.

3. Stop recovery when system hits critical-heap-memory
- This requires setting/recommending critical-heap-percentage. Throwing
LowMemoryException during recovery, if system is low on memory.
- This may impact the first read on the region whose values are not
recovered.

Thanks,
-Anil.


Reply via email to