Nilkanth, One reasons I can think of; during recovery system may be consuming more memory than (during the shutdown); the reason could be relating to how data is recovered from the disk (reading krf, drf files first, and then crf files)...The memory may be used up by the data-structures that are reading the disk files (before adding data to cache-region, and resource manager to kick-in and evict). I will let the disk experts to chime in here...
Can you try reducing the eviction and critical heap percentage...say eviction at 70% and heap at 85%.... http://gemfire.docs.pivotal.io/docs-gemfire/latest/managing/disk_storage/file_names_and_extensions.html -Anil. On Fri, Jul 15, 2016 at 2:01 PM, Nilkanth Patel <nilkanth.hpa...@gmail.com> wrote: > Hello, > > Facing issue in recovering data for persisted regions when large amount > (more than heap) of data is persisted. > > brief about scenario . > > Creating 10 regions, lets call it R1, R2, R3, ... R10 with following > config. > For R1, R2, Total # of buckets = 113. > For R3, R4, R10, #of buckets = 511. > > All above regions are configured with Disk persistance enabled (ASYNCH) and > eviction action overflow to disk. like, > > RegionFactory<> rf = > cache.createRegionFactory(RegionShortcut.PARTITION_PERSISTENT_OVERFLOW); > rf.setDiskSynchronous(false) //for asynch writes. > rf.setDiskStoreName("myDiskStore");PartitionAttributesFactory paf = > new > PartitionAttributesFactory().setRedundantCopies(3).paf.setTotalNumBuckets(511); > > > For each server, Setting both --initial-heap and --max-heap to same, i.e > 16gb with --eviction-heap-percentage=81 --critical-heap-percentage=90 > > I keep the system running (puts, gets, delete) for hours to add data over > time until i have overflowed tons of data approaching the heap size or > more. > Now i shutdown my cluster and then attempt to restart but it does not come > up. It seems during this early phase of recovery (large amount of data), > geode surpasses the critical threshold which kills it before successful > startup. > > Is this observation correct and is this a known limitation...? If so any > work around for this..? > > Also, Considering the above case, recovery for (1) > ForceDisconnect--->Autoconnect case and (2) normal_shutdown-->restart case > is a same mechanism or is there any differences? > > Thanks in advance,. > > Nilkanth. >