[ 
https://issues.apache.org/jira/browse/GEODE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868312#comment-15868312
 ] 

Eric Shu commented on GEODE-2485:
---------------------------------

Suspend and resume could be called from product.

For each create on a NORMAL or PRELOADED region in a transaction, product would 
suspend the transaction and try to get the remote version tag for an entry and 
then resume the transaction.

{noformat}
  /**
   * Fetch Version for the given key from a remote replicate member.
   *
   * @param key
   * @throws EntryNotFoundException if the entry is not found on replicate 
member
   * @return VersionTag for the key
   */
  protected VersionTag fetchRemoteVersionTag(Object key) {
    VersionTag tag = null;
    assert this.dataPolicy != DataPolicy.REPLICATE;
    TransactionId txId = cache.getCacheTransactionManager().suspend();
    try {
      boolean retry = true;
      InternalDistributedMember member = getRandomReplicate();
      while (retry) {
        try {
          if (member == null) {
            break;
          }
          FetchVersionResponse response = 
RemoteFetchVersionMessage.send(member, this, key);
          tag = response.waitForResponse();
          retry = false;
        } catch (RemoteOperationException e) {
          member = getRandomReplicate();
          if (member != null) {
            if (logger.isDebugEnabled()) {
              logger.debug("Retrying RemoteFetchVersionMessage on member:{}", 
member);
            }
          }
        }
      }
    } finally {
      if (txId != null) {
        cache.getCacheTransactionManager().resume(txId);
      }
    }
    return tag;
  }
{noformat}

> CacheTransactionManager suspend/resume can leak memory for 30 minutes
> ---------------------------------------------------------------------
>
>                 Key: GEODE-2485
>                 URL: https://issues.apache.org/jira/browse/GEODE-2485
>             Project: Geode
>          Issue Type: Bug
>          Components: transactions
>            Reporter: Darrel Schneider
>
> Each time you suspend/resume a transaction it leaves about 80 bytes of heap 
> allocated for 30 minutes. If you are doing a high rate of suspend/resume 
> calls then this could cause you to run out of memory in that 30 minute window.
> As a workaround you can set -Dgemfire.suspendedTxTimeout to a value as small 
> as 1 (which would cause the memory to be freed up after 1 minute instead of 
> 30 minutes).
> One fix for this is to periodically call cache.getCCPTimer().timerPurge() 
> after a certain number of resume calls have been done (for example 1000). 
> Currently resume is calling cancel on the TimerTask but that leaves the task 
> in the SystemTimer queue until it expires. Calling timerPurge it addition to 
> cancel will fix this bug. Calling timerPurge for every cancel may cause the 
> resume method to take too long and keep in mind the getCCPTimer is used by 
> other things so the size of the SystemTimer queue that is being purged will 
> not only be the number of suspended txs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to