[
https://issues.apache.org/jira/browse/GEODE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868312#comment-15868312
]
Eric Shu commented on GEODE-2485:
---------------------------------
Suspend and resume could be called from product.
For each create on a NORMAL or PRELOADED region in a transaction, product would
suspend the transaction and try to get the remote version tag for an entry and
then resume the transaction.
{noformat}
/**
* Fetch Version for the given key from a remote replicate member.
*
* @param key
* @throws EntryNotFoundException if the entry is not found on replicate
member
* @return VersionTag for the key
*/
protected VersionTag fetchRemoteVersionTag(Object key) {
VersionTag tag = null;
assert this.dataPolicy != DataPolicy.REPLICATE;
TransactionId txId = cache.getCacheTransactionManager().suspend();
try {
boolean retry = true;
InternalDistributedMember member = getRandomReplicate();
while (retry) {
try {
if (member == null) {
break;
}
FetchVersionResponse response =
RemoteFetchVersionMessage.send(member, this, key);
tag = response.waitForResponse();
retry = false;
} catch (RemoteOperationException e) {
member = getRandomReplicate();
if (member != null) {
if (logger.isDebugEnabled()) {
logger.debug("Retrying RemoteFetchVersionMessage on member:{}",
member);
}
}
}
}
} finally {
if (txId != null) {
cache.getCacheTransactionManager().resume(txId);
}
}
return tag;
}
{noformat}
> CacheTransactionManager suspend/resume can leak memory for 30 minutes
> ---------------------------------------------------------------------
>
> Key: GEODE-2485
> URL: https://issues.apache.org/jira/browse/GEODE-2485
> Project: Geode
> Issue Type: Bug
> Components: transactions
> Reporter: Darrel Schneider
>
> Each time you suspend/resume a transaction it leaves about 80 bytes of heap
> allocated for 30 minutes. If you are doing a high rate of suspend/resume
> calls then this could cause you to run out of memory in that 30 minute window.
> As a workaround you can set -Dgemfire.suspendedTxTimeout to a value as small
> as 1 (which would cause the memory to be freed up after 1 minute instead of
> 30 minutes).
> One fix for this is to periodically call cache.getCCPTimer().timerPurge()
> after a certain number of resume calls have been done (for example 1000).
> Currently resume is calling cancel on the TimerTask but that leaves the task
> in the SystemTimer queue until it expires. Calling timerPurge it addition to
> cancel will fix this bug. Calling timerPurge for every cancel may cause the
> resume method to take too long and keep in mind the getCCPTimer is used by
> other things so the size of the SystemTimer queue that is being purged will
> not only be the number of suspended txs.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)