[ https://issues.apache.org/jira/browse/OOZIE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598964#comment-17598964 ]
Janos Makai commented on OOZIE-3254: ------------------------------------ Meanwhile I have reviewed and tested the current approach ( [^OOZIE-3254-01-wip.patch] ), however I have still experienced OOM unfortunately. The mentioned {{CoordMaterializeTransitionXCommand#{*}insertList{*}}} indeed got cleared, but the {{org.apache.oozie.command.coord.CoordActionNotificationXCommand}} and {{org.apache.oozie.command.coord.CoordActionInputCheckXCommand}} commands invoked inside {{org.apache.oozie.command.coord.CoordMaterializeTransitionXCommand}} made the memory full. Currently I am working on a fix, which bypasses this by refactoring {{CoordMaterializeTransitionXCommand}} in case LAST_ONLY or NONE coordinator execution type to only materialize a certain amount of actions. > [coordinator] LAST_ONLY and NONE execution modes: possible OutOfMemoryError > when there are too many coordinator actions to materialize > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: OOZIE-3254 > URL: https://issues.apache.org/jira/browse/OOZIE-3254 > Project: Oozie > Issue Type: Bug > Components: coordinator > Affects Versions: 5.0.0 > Reporter: Andras Piros > Assignee: Janos Makai > Priority: Major > Attachments: OOZIE-3254-01-wip.patch > > > If there is a coordinator job defined with a {{frequency}} by the minute > (e.g. {{frequency="* * * * *"}}), and {{start-time}} lies well in the past, > and the coordinator job's {{execution-mode}} is {{LAST_ONLY}} or {{NONE}}, it > can happen that too many {{CoordinatorActionBean}} instances are kept on JVM > heap within {{CoordMaterializeTransitionXCommand#insertList}} as those > execution modes [*omit the check for the {{throttle}} > value*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java#L439-L443]. > As a consequence, we can see as many as multiple hundred thousands of log > entries [*trying to increase > {{CoordMaterializeTransitionXCommand#insertList}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java#L560-L566]: > {noformat} > [user@host ~]$ grep 'In storeToDB() coord action id' > /var/log/oozie/oozie-HOSTNAME.log.out | wc -l > 478408 > {noformat} > A much worse consequence is that those {{CoordinatorActionBean}} instances > are attached to GC root (the {{insertList}} itself), and thus, JVM is unable > to free them until a consequent call to {{insertList.clear()}}. This will > result in {{OutOfMemoryError}} occurrence in worst case. > {{CoordMaterializeTransitionXCommand#insertList}} should be watched for a > configurable limit parameter (default value something like 1000), and > persisted / cleared when that limit is reached. -- This message was sent by Atlassian Jira (v8.20.10#820010)