knight6236 opened a new issue, #10669:
URL: https://github.com/apache/seatunnel/issues/10669
## Background
SeaTunnel already provides solid foundations for plugin loading and
execution:
- Centralized ClassLoader management via `ClassLoaderService`
- Dynamic loading and execution of connectors
- Certain class isolation mechanisms (e.g., child‑first loader)
From a pure functionality perspective, this works well and covers most
current use cases.
However, from the perspective of **long‑running Engine nodes (repeated job
scheduling, varying plugin combinations)**, the lifecycle management and
isolation boundaries of ClassLoaders still exhibit some “non‑deterministic”
behavior. I recently conducted a systematic review of the ClassLoader‑related
code and would like to share some observations and a possible evolution path.
Feedback and discussion are highly welcome.
---
## Gap
### 1. Semantics of “release” vs “close”
`DefaultClassLoaderService.releaseClassLoader()` removes the ClassLoader
from the cache and performs some thread‑local cleanup when the reference count
reaches zero, but **it never calls `URLClassLoader.close()` explicitly**. The
service’s own `close()` only clears the internal map.
**Potential impacts**:
- JAR file handles are released only when GC decides to do so
- On Windows, JAR files may remain locked, preventing deletion or replacement
- Metaspace reclamation is not deterministic; over time it may accumulate
### 2. Classloader boundaries can still change at runtime
Several code paths still inject dependencies into the current ClassLoader
via `addURL` at runtime, for example:
- Reflection‑based `addURL` in `AbstractPluginDiscovery`
- Plugin dependency injection into the current loader in Flink execution
paths
**Potential impacts**:
- The actual classloader boundary depends not only on the loader structure
but also on runtime behavior
- When multiple jobs reuse the same process with different plugin sets,
“historical contamination” may occur
- Any attempt to verify unloading is weakened by this mutable boundary
### 3. Unmanaged residual references
The codebase uses multiple patterns for thread‑context ClassLoaders
(synchronous, asynchronous, cross‑thread). Some places:
- Do not restore the original TCCL in a `finally` block
- Restore the wrong baseline when switching across threads
In addition, typical ClassLoader‑holding points such as JDBC driver
registration (e.g., TDengine) or connectors that directly set TCCL without
restoration are not yet managed in a unified way.
> See the appendix for concrete code references.
---
## Goal
Improve SeaTunnel for long‑running deployments in terms of:
- **Controllable ClassLoader lifecycle**
- **Stable classloading boundaries**
- **Predictable resource reclamation and verifiability**
Move from “ClassLoader works” to “ClassLoader is governable”.
---
## Proposal
The following is a **progressive, non‑breaking** improvement path, which can
be implemented in phases.
### Phase 1: Close the lifecycle explicitly
- **Goal**: Resource release no longer depends on GC; instead it is
triggered explicitly.
- **Idea**: For `URLClassLoader` instances created by SeaTunnel, call
`close()` when the reference count reaches zero.
- **Acceptance**: When caching is off, JAR files can be deleted immediately
after release; Metaspace does not grow with job count.
### Phase 2: Stabilize the classloading boundary
- **Goal**: Eliminate runtime `addURL` from production paths.
- **Idea**: Remove reflective `addURL` calls; determine the complete
classpath before creating the loader.
- **Acceptance**: The same loader instance has a constant classpath across
time; no cross‑job pollution.
### Phase 3: Consolidate residual references
- **Goal**: Unified handling of TCCL, JDBC drivers, threads, and
ThreadLocals.
- **Idea**:
- Encapsulate TCCL switching with try‑with‑resources
- Pair JDBC driver registration with deregistration
- Clearly define thread ownership for background threads
- **Acceptance**: All TCCL switches restore the original context;
`DriverManager` does not hold outdated loaders; threads can be cleaned up.
### Phase 4 (optional): Enable verifiable reclamation
- **Goal**: Provide engineering evidence that a ClassLoader has actually
been collected.
- **Idea**: Use `WeakReference` + `ReferenceQueue` to track loaders, or
expose simple runtime metrics (e.g., number of live loaders).
- **Acceptance**: After repeated job runs, old loaders can be observed to be
reclaimed.
All phases are designed to be compatible with existing configurations such
as `classloader-cache-mode`.
---
## Non-goals
This discussion does not aim to:
- Introduce breaking changes (phase 1 aims for backward compatibility)
- Provide full runtime‑equivalent validation
- Change existing connector semantics
- Restructure the plugin model or ClassLoader architecture
---
## Open Questions
I would love to hear the community’s experience and thoughts:
1. Have you encountered **Metaspace growth, JAR file locks, or Windows
file‑replacement issues** in production?
2. With `classloader-cache-mode = true`, would explicit `close()` conflict
with the caching design? (I have some ideas for compatibility but want to
understand the design intent first.)
3. Is there already a community‑wide convention for handling TCCL and JDBC
driver registration, or is each module currently responsible for its own
cleanup?
---
## Appendix (Code references & diagnostics)
<details>
<summary>Click to expand: Key code locations (based on
2.3.13-release)</summary>
### 1. Release path without explicit close
- `DefaultClassLoaderService.releaseClassLoader()` – [line
103‑132](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-engine/seatunnel-engine-core/src/main/java/org/apache/seatunnel/engine/core/classloader/DefaultClassLoaderService.java#L103-L132)
– no `close()` call
- `DefaultClassLoaderService.close()` – [line
193‑197](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-engine/seatunnel-engine-core/src/main/java/org/apache/seatunnel/engine/core/classloader/DefaultClassLoaderService.java#L193-L197)
– only clears map
### 2. Runtime addURL
- `AbstractPluginDiscovery.addURL` – [line
80‑87](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-plugin-discovery/src/main/java/org/apache/seatunnel/plugin/discovery/AbstractPluginDiscovery.java#L80-L87)
– reflective `addURL`
- `AbstractPluginDiscovery` fallback – [line
220‑233](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-plugin-discovery/src/main/java/org/apache/seatunnel/plugin/discovery/AbstractPluginDiscovery.java#L220-L233)
– `URLClassLoader` without `close`
- Flink starter related:
[`FlinkAbstractPluginExecuteProcessor`](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-core/seatunnel-flink-starter/seatunnel-flink-starter-common/src/main/java/org/apache/seatunnel/core/starter/flink/execution/FlinkAbstractPluginExecuteProcessor.java#L40-L69)
and
[`FlinkExecution`](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-core/seatunnel-flink-starter/seatunnel-flink-starter-common/src/main/java/org/apache/seatunnel/core/starter/flink/execution/FlinkExecution.java#L194-L195)
### 3. TCCL not restored / residual references
- `TaskExecutionService` – [line
783‑816](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/TaskExecutionService.java#L783-L816)
– cooperative worker TCCL without `finally`
- `RequestSplitOperation` – [line
54‑63](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/task/operation/source/RequestSplitOperation.java#L54-L63)
– TCCL switch without try/finally
- `SourceRegisterOperation` – [line
60‑86](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/task/operation/source/SourceRegisterOperation.java#L60-L86)
– cross‑thread restoration wrong baseline
- `NotifyTaskRestoreOperation` – [line
88‑125](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/checkpoint/operation/NotifyTaskRestoreOperation.java#L88-L125)
– same issue
- `IcebergCatalogLoader` – [line
60](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-connectors-v2/connector-iceberg/src/main/java/org/apache/seatunnel/connectors/seatunnel/iceberg/IcebergCatalogLoader.java#L60)
– direct `setContextClassLoader` without restore
- `PaimonCatalogLoader` – [line
78](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-connectors-v2/connector-paimon/src/main/java/org/apache/seatunnel/connectors/seatunnel/paimon/catalog/PaimonCatalogLoader.java#L78)
– same
- `LanceCatalogLoader` – [line
49](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-connectors-v2/connector-lance/src/main/java/org/apache/seatunnel/connectors/seatunnel/lance/catalog/LanceCatalogLoader.java#L49)
– same
- `TDengineUtil` – [line
46‑56](https://github.com/apache/seatunnel/blob/2.3.13-release/seatunnel-connectors-v2/connector-tdengine/src/main/java/org/apache/seatunnel/connectors/seatunnel/tdengine/utils/TDengineUtil.java#L46-L56)
– `DriverManager.registerDriver` without deregistration
</details>
<details>
<summary>Click to expand: Diagnostic methods (how to detect whether a
ClassLoader is reclaimed)</summary>
- **WeakReference + ReferenceQueue**: Create a WeakReference when a loader
is released; poll the queue to know if it has been collected.
- **JMX**: Monitor `java.lang.ClassLoading` for loaded class count.
- **Heap dump**: Use MAT to search for `SeaTunnelChildFirstClassLoader`
instances and analyze GC roots.
</details>
---
The observations above are based on reading the code; I may have missed or
misinterpreted something. If you have encountered similar issues in production,
or if there are design decisions I am not aware of, please feel free to share.
Let’s discuss! 🙌
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]