[jira] [Resolved] (OOZIE-3181) High frequency LAST_ONLY coord. job with many past time actions kills Oozie server

Andras Salamon (JIRA) Fri, 12 Apr 2019 03:03:39 -0700


     [ 
https://issues.apache.org/jira/browse/OOZIE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andras Salamon resolved OOZIE-3181.
-----------------------------------
    Resolution: Duplicate

> High frequency LAST_ONLY coord. job with many past time actions kills Oozie 
> server
> ----------------------------------------------------------------------------------
>
>                 Key: OOZIE-3181
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3181
>             Project: Oozie
>          Issue Type: Bug
>    Affects Versions: 4.3.0
>            Reporter: Oleksandr Kalinin
>            Priority: Major
>
> User submitting high frequency LAST_ONLY coordinator job for past time 
> (intentionally or by mistake) triggers enormous materialization loop for that 
> job and potentially OOM condition even with high heap settings.
> Simplest example is:
> coordStarts=2017-02-12T09:00Z
>  coordEnds=2019-02-12T09:00Z
>  coordFrequency=*/1 * * * *
> <execution>LAST_ONLY</execution>
> Since throttling parameters are ignored on LAST_ONLY past actions, this 
> triggers non throttled materialization of more than 500K actions lying in the 
> past which causes severe memory pressure and eventual GC overhead lockout.
> At the same time by definition all past actions will be skipped anyway, thus 
> it seems that the only value in materializing them is ability to view SKIPPED 
> status later. Is it really worth the risk?
> Note : additional severity of this problem is that it's not trivial to 
> recover it on ZK-coordinated clusters. Write lock will persist which will 
> prevent kill command from taking desired effect, and that lock will persist 
> also after restart. To recover, write lock has to be manually removed.
> Looking at materialization loop code, I believe there is potential for 
> algorithm improvement to prevent this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (OOZIE-3181) High frequency LAST_ONLY coord. job with many past time actions kills Oozie server

Reply via email to