I like number 1 On Tue, Jan 17, 2017 at 3:56 PM Dan Smith <dsm...@pivotal.io> wrote:
> Option 1 is not a bad idea. Another thought might be to not start > asynchronous value recovery until all of the regions are created. I think > right now we launch a task to read all of the oplogs and recover values as > soon as the disk store is created. Maybe we could defer that until after > the last region in that disk store is actually created. Option 2 seems > pretty complicated for a small window of time. > > -Dan > > On Tue, Jan 17, 2017 at 2:41 PM, Anilkumar Gingade <aging...@pivotal.io> > wrote: > > > Hi Geode Devs, > > > > We are working on ticket GEODE-1672, related to out of memory during > > recovery with overflow regions (heap LRU configured). > > > > https://issues.apache.org/jira/browse/GEODE-1672 > > > > When recovering the persistent files, GEODE stores the values into temp > > maps (for regions) using a background thread, as these maps are not > actual > > regions, these are not considered/included for LRU eviction, which > causes > > the system to run OOM. > > > > We are thinking about following approaches to address this issue...Let us > > know if you have any comments/suggestion about the solutions. > > > > 1. Skip recovering the regions marked with LRU eviction. > > - This keeps the code changes to minimal. > > - Accessing the most recently used values first time, will be expensive. > > But this is true even if the values are recovered, as Geode doesn't > > guarantee the recently/most used values will be in memory after recovery. > > - This may impact the use-cases where regions are set with LRU eviction, > > even though there is no memory pressure (system configured to handle > > unexpected events) > > > > 2. Include temp maps (these are AbstractRegionMap) for eviction during > > recovery. > > - May involve lots of code change. The size estimation code in bucket > > regions need to be moved to AbstractRegionMap. > > - Need to handle the rate of recovery thread to throttle based on the > > eviction rate, which could impact the recovery of regions without > eviction. > > We can think of overriding the default eviction rate during recovery... > > - The regions will be in the similar state (number of entries), when > system > > is recovered. > > > > 3. Stop recovery when system hits critical-heap-memory > > - This requires setting/recommending critical-heap-percentage. Throwing > > LowMemoryException during recovery, if system is low on memory. > > - This may impact the first read on the region whose values are not > > recovered. > > > > Thanks, > > -Anil. > > >