NianticRyan opened a new issue, #24098:
URL: https://github.com/apache/beam/issues/24098

   ### What happened?
   
   Hi, 
   
   I recently encountered an issue when upgrading my apache beam SDKs. I was 
attempting to upgrade from 2.30.0 to 2.40.0, and there was a file descriptor 
leak where approximately every time a new pipeline was started, java would 
create a new set of file descriptors to the libraries on the classpath. This 
resulted in many duplicate file descriptors being open and eventually hitting 
the maximum number of file descriptors.
   
   I was able to track it down to when I bump from 2.35.0 to 2.36.0, and I am 
99% certain that [this 
commit](https://github.com/apache/beam/commit/970bdc0ed7142f5263e9a78fc4d715d50539e7ef)
 is the source. 
   
   The main difference in this code is that 
`classGraph.disableNestedJarScanning().addClassLoader(classLoader).scan(1).getClasspathFiles();`
 creates a new `Scanner` object with `performScan=true`, 
   whereas 
`classGraph.disableNestedJarScanning().addClassLoader(classLoader).getClasspathFiles();`
 does the same with `performScan=false`. 
   `PerformScan=true` runs an asynchronoous scan of the classpath, and 
`PerformScan=false` just generates a placeholder ScanResult with just the 
classpath. The scan itself is creating the fd leak.
   
   The file leak itself is coming from the classgraph scanner. 
   Here's one example of an open file descriptor
   ```
   #697 /default/lib/guice-assistedinject-3.0.jar by thread:ClassGraph-worker-2 
on Wed Nov 09 22:10:48 UTC 2022
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:244)
        at 
nonapi.io.github.classgraph.fileslice.FileSlice.<init>(FileSlice.java:134)
        at 
nonapi.io.github.classgraph.fileslice.FileSlice.<init>(FileSlice.java:178)
        at 
nonapi.io.github.classgraph.fastzipfilereader.PhysicalZipFile.<init>(PhysicalZipFile.java:87)
        at 
nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$1.newInstance(NestedJarHandler.java:93)
        at 
nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$1.newInstance(NestedJarHandler.java:90)
        at 
nonapi.io.github.classgraph.concurrency.SingletonMap.get(SingletonMap.java:189)
        at 
nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$4.newInstance(NestedJarHandler.java:189)
        at 
nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler$4.newInstance(NestedJarHandler.java:154)
        at 
nonapi.io.github.classgraph.concurrency.SingletonMap.get(SingletonMap.java:189)
        at 
io.github.classgraph.ClasspathElementZip.open(ClasspathElementZip.java:162)
        at io.github.classgraph.Scanner$3.processWorkUnit(Scanner.java:595)
        at io.github.classgraph.Scanner$3.processWorkUnit(Scanner.java:567)
        at 
nonapi.io.github.classgraph.concurrency.WorkQueue.runWorkLoop(WorkQueue.java:246)
        at 
nonapi.io.github.classgraph.concurrency.WorkQueue.runWorkQueue(WorkQueue.java:161)
        at io.github.classgraph.Scanner.processWorkUnits(Scanner.java:342)
        at 
io.github.classgraph.Scanner.openClasspathElementsThenScan(Scanner.java:1047)
        at io.github.classgraph.Scanner.call(Scanner.java:1146)
        at io.github.classgraph.Scanner.call(Scanner.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   ```
   
   Our configuration is using kubernetes to kick off a google cloud dataflow 
using the apache beam sdk.
   Please let me know if there is any other information I can provide.
   
   Is anyone else hitting a similar issue?
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: runner-core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to