[
https://issues.apache.org/jira/browse/OOZIE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andras Salamon resolved OOZIE-3181.
-----------------------------------
Resolution: Duplicate
> High frequency LAST_ONLY coord. job with many past time actions kills Oozie
> server
> ----------------------------------------------------------------------------------
>
> Key: OOZIE-3181
> URL: https://issues.apache.org/jira/browse/OOZIE-3181
> Project: Oozie
> Issue Type: Bug
> Affects Versions: 4.3.0
> Reporter: Oleksandr Kalinin
> Priority: Major
>
> User submitting high frequency LAST_ONLY coordinator job for past time
> (intentionally or by mistake) triggers enormous materialization loop for that
> job and potentially OOM condition even with high heap settings.
> Simplest example is:
> coordStarts=2017-02-12T09:00Z
> coordEnds=2019-02-12T09:00Z
> coordFrequency=*/1 * * * *
> <execution>LAST_ONLY</execution>
> Since throttling parameters are ignored on LAST_ONLY past actions, this
> triggers non throttled materialization of more than 500K actions lying in the
> past which causes severe memory pressure and eventual GC overhead lockout.
> At the same time by definition all past actions will be skipped anyway, thus
> it seems that the only value in materializing them is ability to view SKIPPED
> status later. Is it really worth the risk?
> Note : additional severity of this problem is that it's not trivial to
> recover it on ZK-coordinated clusters. Write lock will persist which will
> prevent kill command from taking desired effect, and that lock will persist
> also after restart. To recover, write lock has to be manually removed.
> Looking at materialization loop code, I believe there is potential for
> algorithm improvement to prevent this issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)