[ https://issues.apache.org/jira/browse/GEODE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868312#comment-15868312 ]
Eric Shu commented on GEODE-2485: --------------------------------- Suspend and resume could be called from product. For each create on a NORMAL or PRELOADED region in a transaction, product would suspend the transaction and try to get the remote version tag for an entry and then resume the transaction. {noformat} /** * Fetch Version for the given key from a remote replicate member. * * @param key * @throws EntryNotFoundException if the entry is not found on replicate member * @return VersionTag for the key */ protected VersionTag fetchRemoteVersionTag(Object key) { VersionTag tag = null; assert this.dataPolicy != DataPolicy.REPLICATE; TransactionId txId = cache.getCacheTransactionManager().suspend(); try { boolean retry = true; InternalDistributedMember member = getRandomReplicate(); while (retry) { try { if (member == null) { break; } FetchVersionResponse response = RemoteFetchVersionMessage.send(member, this, key); tag = response.waitForResponse(); retry = false; } catch (RemoteOperationException e) { member = getRandomReplicate(); if (member != null) { if (logger.isDebugEnabled()) { logger.debug("Retrying RemoteFetchVersionMessage on member:{}", member); } } } } } finally { if (txId != null) { cache.getCacheTransactionManager().resume(txId); } } return tag; } {noformat} > CacheTransactionManager suspend/resume can leak memory for 30 minutes > --------------------------------------------------------------------- > > Key: GEODE-2485 > URL: https://issues.apache.org/jira/browse/GEODE-2485 > Project: Geode > Issue Type: Bug > Components: transactions > Reporter: Darrel Schneider > > Each time you suspend/resume a transaction it leaves about 80 bytes of heap > allocated for 30 minutes. If you are doing a high rate of suspend/resume > calls then this could cause you to run out of memory in that 30 minute window. > As a workaround you can set -Dgemfire.suspendedTxTimeout to a value as small > as 1 (which would cause the memory to be freed up after 1 minute instead of > 30 minutes). > One fix for this is to periodically call cache.getCCPTimer().timerPurge() > after a certain number of resume calls have been done (for example 1000). > Currently resume is calling cancel on the TimerTask but that leaves the task > in the SystemTimer queue until it expires. Calling timerPurge it addition to > cancel will fix this bug. Calling timerPurge for every cancel may cause the > resume method to take too long and keep in mind the getCCPTimer is used by > other things so the size of the SystemTimer queue that is being purged will > not only be the number of suspended txs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)