ashishdeole opened a new issue, #25510:
URL: https://github.com/apache/beam/issues/25510

   ### What happened?
   
   ### Pre-requisites 
   We should have an independent flink cluster running. (either standalone or 
using kubernetes resource provider). We used flink 1.14.5.
   And requirement is to you session mode of flink job submission.
   
   ### Steps to reproduce 
   
   1. Use WordCount example from beam ( latest stable and even with 2.38.0).
   2. Run the wordcount pipeline on the flink cluster.
   3. Submit the same multiple times. Based on JVM configuration, after some 
jobs we get Outofmemoryerror - metaspace.
   4. Even if we dont wait for outofmemory and get the heapdump - it can be 
observed that ChildFirstClassLoader is not getting garbage collected after each 
job is finished. Leading to memory leakage.
   
   ### Analysis
   
   1. When jobs are submitted to flink in session mode, flink uses 
jobmanager-io-thread for each submitted job using thread pool.
   2. Beam PipelineOptionsFactory has threadLocal 
DefaultDeserializationContext.Impl. 
   3. DefaultDeserializationContext.Impl is loaded from childfirstclassloader 
and after job completion as the job manager thread goes back to pool this 
threadLocal is not removed and leads to memory leak.
   
   
   PFA heapdump produced from Outofmemory - metaspace after submitting 
wordcount job 4-5 times ( jobmanager.memory.jvm-metaspace.size: 100mb was 
deliberately kept to reproduce). 
   
   ### References
   
   1. 
https://cwiki.apache.org/confluence/display/FLINK/Debugging+ClassLoader+leaks
   2. 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java
 
   3. deserializationcontext was made threadlocal using this [PR  
](https://github.com/apache/beam/pull/16680) for a bug BEAM-13782.
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to