knight6236 opened a new pull request, #10678:
URL: https://github.com/apache/seatunnel/pull/10678

   When ClassLoader reference count reaches zero, the ClassLoader was only 
removed from the cache but never closed. This causes JAR file handle leaks and 
Metaspace growth in long-running SeaTunnel Engine clusters.
   
   Changes:
   - Call URLClassLoader.close() when reference count reaches zero
   - Close all ClassLoaders when ClassLoaderService is shut down
   - Disable JAR URL connection cache to prevent stale file handles
   - Add optional deep clean mode for URLClassPath and JarFileFactory global 
cache cleanup (disabled by default, requires --add-opens)
   
   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code 
changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
     - Make sure that the pull request corresponds to a [GITHUB 
issue](https://github.com/apache/seatunnel/issues).
     - Name the pull request in the form "[Feature] [component] Title of the 
pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix 
typo in README.md doc`.
   -->
   
   ### Purpose of this pull request
   
   <!-- Describe the purpose of this pull request. For example: This pull 
request adds checkstyle plugin.-->
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   <!--
   Note that it means *any* user-facing change including all aspects such as 
the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes 
- provide the console output, description and/or an example to show the 
behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to 
the released SeaTunnel versions or within the unreleased branches such as dev.
   If no, write 'No'.
   If you are adding/modifying connector documents, please follow our new 
specifications: https://github.com/apache/seatunnel/issues/4544.
   -->
   
   
   ### How was this patch tested?
   
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   If you are adding E2E test cases, maybe refer to 
https://github.com/apache/seatunnel/blob/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mysql-e2e/src/test/resources/mysqlcdc_to_mysql.conf,
 here is a good example.
   -->
   
   
   ### Check list
   
   * [ ] If any new Jar binary package adding in your PR, please add License 
Notice according
     [New License 
Guide](https://github.com/apache/seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [ ] If necessary, please update the documentation to describe the new 
feature. https://github.com/apache/seatunnel/tree/dev/docs
   * [ ] If necessary, please update `incompatible-changes.md` to describe the 
incompatibility caused by this PR.
   * [ ] If you are contributing the connector code, please check that the 
following files are updated:
     1. Update 
[plugin-mapping.properties](https://github.com/apache/seatunnel/blob/dev/plugin-mapping.properties)
 and add new connector information in it
     2. Update the pom file of 
[seatunnel-dist](https://github.com/apache/seatunnel/blob/dev/seatunnel-dist/pom.xml)
     3. Add ci label in 
[label-scope-conf](https://github.com/apache/seatunnel/blob/dev/.github/workflows/labeler/label-scope-conf.yml)
     4. Add e2e testcase in 
[seatunnel-e2e](https://github.com/apache/seatunnel/tree/dev/seatunnel-e2e/seatunnel-connector-v2-e2e/)
     5. Update connector 
[plugin_config](https://github.com/apache/seatunnel/blob/dev/config/plugin_config)
     
     
    ```markdown
   ### Purpose
   
   Close `URLClassLoader` explicitly when its reference count reaches zero,
   instead of relying on GC finalization to release JAR file handles.
   
   ### Background
   
   In the current implementation, when a job finishes and the ClassLoader
   reference count drops to zero, the ClassLoader is removed from the
   internal cache but `URLClassLoader.close()` is never called. This leads
   to:
   
   - JAR file handle leaks (fd exhaustion under high job throughput)
   - Metaspace growth proportional to the number of completed jobs
   - On Windows, JAR files cannot be deleted or replaced while handles
     are held
   
   ### Changes
   
   1. **Explicit close on release**: Call `URLClassLoader.close()` when the
      reference count reaches zero in non-cache mode.
   
   2. **Close on service shutdown**: Invoke `close()` on all cached
      ClassLoaders when `ClassLoaderService.close()` is called.
   
   3. **Disable JAR URL connection cache**: Call
      `URLConnection.setDefaultUseCaches(false)` at service startup to
      prevent the JVM from caching JAR connections globally, which would
      otherwise keep file handles open even after `close()`.
   
   4. **Optional deep clean mode**: Add `SEATUNNEL_CLASSLOADER_DEEP_CLEAN`
      system property (default `false`) to enable reflection-based cleanup
      of `URLClassPath` internal state and `JarFileFactory` global cache.
      This is targeted — only cleans entries belonging to the current
      ClassLoader, will not affect other concurrent jobs.
   
      On JDK 9+, this mode requires the following JVM options:
   
      ```
      --add-opens java.base/java.net=ALL-UNNAMED
      --add-opens java.base/sun.net.www.protocol.jar=ALL-UNNAMED
      ```
   
      On JDK 8, no additional options are needed.
   
      This is intended for environments with extreme Metaspace pressure or
      file handle exhaustion that cannot be resolved by `close()` alone.
   
   ### How to test
   
   - Existing unit tests in `ClassLoaderServiceTest` continue to pass.
   - Verified that after `releaseClassLoader()`, the JAR files are no
     longer locked and can be deleted immediately.
   - Verified that Metaspace usage remains stable after repeated job
     submissions in a long-running cluster.
   
   ### API Changes
   
   No public API changes. Two new constants are added:
   
   | Constant | Type | Default | Description |
   |----------|------|---------|-------------|
   | `CLASSLOADER_SERVICE_SKIP_CHECK_JAR` | Environment Variable | `false` | 
Skip JAR file existence check (existing, unchanged) |
   | `SEATUNNEL_CLASSLOADER_DEEP_CLEAN` | System Property (`-D`) | `false` | 
Enable reflection-based deep clean of JVM internal JAR caches |
   
   ### Compatibility
   
   - **JDK 8**: Fully compatible, no additional JVM options required.
   - **JDK 11**: Fully compatible. Deep clean mode requires `--add-opens`
     options listed above; without them, deep clean silently falls back
     to standard `close()` only.
   - **Cache mode**: In cache mode, ClassLoaders are shared across jobs and
     are only closed during service shutdown, consistent with existing
     behavior.
   - **Non-cache mode**: ClassLoaders are now explicitly closed when their
     reference count reaches zero. This is the primary behavioral change.
   
   ### Checklist
   
   - [x] I have read the [Contributing
     Guidelines](https://github.com/apache/seatunnel/blob/dev/CONTRIBUTING.md).
   - [x] I have created an issue on the [SeaTunnel
     JIRA](https://github.com/apache/seatunnel/issues) and linked it to
     this PR.
   - [x] My changes do not break existing unit tests.
   - [x] I have added documentation if necessary.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to