[ 
https://issues.apache.org/jira/browse/FLINK-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michel Davit updated FLINK-9080:
--------------------------------
    Attachment: class_loader_leak.png

> Flink Scheduler goes OOM, suspecting a memory leak
> --------------------------------------------------
>
>                 Key: FLINK-9080
>                 URL: https://issues.apache.org/jira/browse/FLINK-9080
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.4.0
>            Reporter: Rohit Singh
>            Assignee: Stefan Richter
>            Priority: Major
>         Attachments: Screenshot 2018-12-18 at 12.14.11.png, Top Level 
> packages.JPG, Top level classes.JPG, class_loader_leak.png, classesloaded vs 
> unloaded.png
>
>
> Running FLink version 1.4.0. on mesos,scheduler running along  with job 
> manager in single container, whereas task managers running in seperate 
> containers.
> Couple of jobs were running continously, Flink scheduler was working 
> properlyalong with task managers. Due to some change in data, one of the jobs 
> started failing continuously. In the meantime,there was a surge in  flink 
> scheduler memory usually eventually died out off OOM
>  
> Memory dump analysis was done, 
> Following were findings  !Top Level packages.JPG!!Top level classes.JPG!
>  *  Majority of top loaded packages retaining heap indicated towards 
> Flinkuserclassloader, glassfish(jersey library), Finalizer classes. (Top 
> level package image)
>  * Top level classes were of Flinkuserclassloader, (Top Level class image)
>  * The number of classes loaded vs unloaded was quite less  PFA,inspite of 
> adding jvm options of -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled , 
> PFAclassloaded vs unloaded graph, scheduler was restarted 3 times
>  * There were custom classes as well which were duplicated during subsequent 
> class uploads
> PFA all the images of heap dump.  Can you suggest some pointers on as to how 
> to overcome this issue.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to