Hi Geode Devs, We are working on ticket GEODE-1672, related to out of memory during recovery with overflow regions (heap LRU configured).
https://issues.apache.org/jira/browse/GEODE-1672 When recovering the persistent files, GEODE stores the values into temp maps (for regions) using a background thread, as these maps are not actual regions, these are not considered/included for LRU eviction, which causes the system to run OOM. We are thinking about following approaches to address this issue...Let us know if you have any comments/suggestion about the solutions. 1. Skip recovering the regions marked with LRU eviction. - This keeps the code changes to minimal. - Accessing the most recently used values first time, will be expensive. But this is true even if the values are recovered, as Geode doesn't guarantee the recently/most used values will be in memory after recovery. - This may impact the use-cases where regions are set with LRU eviction, even though there is no memory pressure (system configured to handle unexpected events) 2. Include temp maps (these are AbstractRegionMap) for eviction during recovery. - May involve lots of code change. The size estimation code in bucket regions need to be moved to AbstractRegionMap. - Need to handle the rate of recovery thread to throttle based on the eviction rate, which could impact the recovery of regions without eviction. We can think of overriding the default eviction rate during recovery... - The regions will be in the similar state (number of entries), when system is recovered. 3. Stop recovery when system hits critical-heap-memory - This requires setting/recommending critical-heap-percentage. Throwing LowMemoryException during recovery, if system is low on memory. - This may impact the first read on the region whose values are not recovered. Thanks, -Anil.