Github user dschneider-pivotal commented on a diff in the pull request:
https://github.com/apache/geode/pull/559#discussion_r120222821
--- Diff:
geode-docs/managing/troubleshooting/system_failure_and_recovery.html.md.erb ---
@@ -276,8 +276,83 @@ find the reason.
Description:
-The process discovered that it was not in the distributed system and
cannot determine why it was removed. The membership coordinator removed the
member after it failed to respond to an internal are you alive message.
+The process discovered that it was not in the distributed system and
cannot determine why it was
+removed. The membership coordinator removed the member after it failed to
respond to an internal
+are-you-alive message.
Response:
The operator should examine the locator processes and logs.
+
+## <a id="restart-failure-persistent-lru" class="no-quick-link"></a>
Restart Fails Due To Out-of-Memory Error
+
+This section describes a restart failure that can occur when the stopped
system is one that was configured with persistent regions. Specifically:
+
+- Some of the regions of the recovering system, when running, were
configured as PERSISTENT regions, which means that they save their data to disk.
+- At least one of the persistent regions was configured to evict least
recently used (LRU) data by overflowing values to disk.
+
+### How Data is Recovered From Persistent Regions
+
+Data recovery, upon restart, always recovers keys. You can configure
whether and how the system
+recovers the values associated with those keys to populate the system
cache.
+
+**Value Recovery**
+
+- Recovering all values immediately during startup slows the startup time
but results in consistent
+read performance after the startup on a "hot" cache.
+
+- Recovering no values means quicker startup but a "cold" cache, so the
first retrieval of each value will read from disk.
+
+- Retrieving values asynchronously in a background thread allows a
relatively quick startup on a "warm" cache
+that will eventually recover every value.
+
+**Retrieve or Ignore LRU values**
+
+When a system with persistent LRU regions shuts down, the system does not
record which of the values
+were recently used. On subsequent startup, if values are recovered into an
LRU region they may be
+the least recently used instead of the most recently used. Also, if LRU
values are recovered on a
+heap or an off-heap LRU region, it is possible that the LRU memory limit
will be exceeded, resulting
+in an `OutOfMemoryException` during recovery. For these reasons, LRU value
recovery can be treated
+differently than non-LRU values.
+
+## Default Recovery Behavior for Persistent Regions
+
+The default behavior is for the system to recover all keys, then
asynchronously recover all data
+values that were resident, leaving LRU values unrecovered. This default
strategy is best for
--- End diff --
drop "that were resident"
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---