[ https://issues.apache.org/jira/browse/OOZIE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503280#comment-14503280 ]
Purshotam Shah commented on OOZIE-2206: --------------------------------------- {quote} Posted 9 months, 2 weeks ago (July 3, 2014, 6:59 p.m.) core/src/main/java/org/apache/oozie/service/ZKLocksService.java (Diff revision 2) 67 reaper = new ChildReaper(zk.getClient(), LOCKS_NODE, Reaper.Mode.REAP_UNTIL_DELETE, getExecutorService(), Shouldn't the mode be REAP_UNTIL_DELETE? REAP_UNTIL_DELETE will stop after the first deletion, right? https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/locks/Reaper.Mode.html The issue has been resolved. Show all issues Purshotam Shah 9 months, 1 week ago (July 11, 2014, 4:47 p.m.) REAP_INDEFINITELY:Reap forever, or until removePath is called for the path. I think this param is passed for each lock. REAP_INDEFINITELY is used as default implementation. Will use REAP_INDEFINITELY. {quote} We started with REAP_UNTIL_DELETE, but moved to REAP_INDEFINITELY based on discussion on review board. We should also check with curator team, there could be some bug in curator code. > Change Reaper mode on ChildReaper in ZKLocksService > --------------------------------------------------- > > Key: OOZIE-2206 > URL: https://issues.apache.org/jira/browse/OOZIE-2206 > Project: Oozie > Issue Type: Bug > Affects Versions: trunk > Reporter: Ryota Egashira > Assignee: Ryota Egashira > Fix For: trunk > > Attachments: OOZIE-2206.patch > > > OOZIE-1906 added znode cleanup thread. > currently passing Reaper.Mode.REAP_INDEFINITELY, but this enforce Oozie > server to keep reaping znode even after znode is cleaned up. > (https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/Reaper.java) > > This adds memory pressure on oozie server. Need to change to REAP_UNTIL_GONE > or REAP_UNTIL_DELETE > > {code} > reaper = new ChildReaper(zk.getClient(), LOCKS_NODE, > Reaper.Mode.REAP_INDEFINITELY, getExecutorService(), > ConfigurationService.getInt(services.getConf(), REAPING_THRESHOLD) * 1000, > REAPING_LEADER_PATH); > {code} > we hit one scenario where one ZK quorum slows down for short period, causing > many Zk locks not released properly, right after ChildReaper (every 5 min ) > runs, which keep checking the list of Znode ever since, in the end, Oozie > server hit OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)