[ https://issues.apache.org/jira/browse/GEODE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876399#comment-15876399 ]
Darrel Schneider commented on GEODE-2485: ----------------------------------------- In addition to fixing the code to periodically purge the SystemTimer, it would also be worth changing fetchRemoteVersionTag to not use suspend/resume. All that this code is doing is sending a message. The code wants this message to always be done outside of a transaction so it suspends. But a more performant way of doing this is just to refactor the code on the RemoteOperationMessage that initializes the txUniqId, txMemberId, and isTransactionDistributed instance variables into a protected method called initializeTransaction. Then override this method in RemoteFetchVersionMessage to do nothing. > CacheTransactionManager suspend/resume can leak memory for 30 minutes > --------------------------------------------------------------------- > > Key: GEODE-2485 > URL: https://issues.apache.org/jira/browse/GEODE-2485 > Project: Geode > Issue Type: Bug > Components: transactions > Reporter: Darrel Schneider > > Each time you suspend/resume a transaction it leaves about 80 bytes of heap > allocated for 30 minutes. If you are doing a high rate of suspend/resume > calls then this could cause you to run out of memory in that 30 minute window. > As a workaround you can set -Dgemfire.suspendedTxTimeout to a value as small > as 1 (which would cause the memory to be freed up after 1 minute instead of > 30 minutes). > One fix for this is to periodically call cache.getCCPTimer().timerPurge() > after a certain number of resume calls have been done (for example 1000). > Currently resume is calling cancel on the TimerTask but that leaves the task > in the SystemTimer queue until it expires. Calling timerPurge it addition to > cancel will fix this bug. Calling timerPurge for every cancel may cause the > resume method to take too long and keep in mind the getCCPTimer is used by > other things so the size of the SystemTimer queue that is being purged will > not only be the number of suspended txs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)