Great proposal! I have a few questions to understand. 1. If the same task is executed multiple times, will these jars be shared? If a task ends, will it affect other tasks?
2. Can we cache these jars? Maybe the next task doesn't need to load again. Looking forward to your reply. -- Best Regards ------------ Liugddx [email protected] 梁欢 <[email protected]> 于2023年7月3日周一 20:33写道: > Hello everyone,When the Zeta engine submits a job, the client first loads > the connector plugin locally and saves the absolute path of the connector > JAR package and the third-party JAR package that the connector runtime > depends on (such as the database driver package) in the logical execution > plan of the job. After submitting the task to the Zeta engine server, the > server obtains the paths of the required JAR packages for each task from > the logical execution plan. It then uses these paths to load the JAR > packages from the server and execute them. > > > > > However, this approach has two significant limitations: > > The server needs to have all connectors and their dependent JAR packages. > > The installation path of the client must be exactly the same as the > server, and the installation path of Seatunnel Zeta in all nodes must also > be the same. This leads to the engine side of SeaTunnel Zeta being > relatively heavy, and the container volume becoming very large when > performing Docker or Kubernetes (K8S) submission tasks. > > > > > To address these limitations, we need to optimize the logic of the Zeta > engine when executing tasks. The server should only have the core JAR > package of the engine, while all connector packages should reside on the > client side. When submitting tasks, the client should upload the required > JAR package to the server instead of just keeping the path of the JAR > package. When the server executes a job, it should download the required > JAR package and then load it. Once the job is completed, the JAR package > can be deleted. > > > > > In Docker or K8S mode, there is currently no unified JAR package > management service provided for project requirements. This includes JAR > packages for connectors and JAR packages that connectors depend on. To > reduce container volume, only the framework package of the Zeta engine > needs to be included in the container image. The JAR package of the > connector and the third-party JAR package that the connector depends on can > be separately uploaded to the pod for distribution. Therefore, a component > that supports the upload and download of all JAR package files must be > implemented on the JobMaster side. The client that submits the task is > responsible for uploading the connector's JAR package and the third-party > JAR package files that the connector depends on to this component for > unified management. All TaskExecutors deployed on different containers are > responsible for downloading the required JAR packages from this component. > The service components on the JobMaster side need to ensure reliable file > management until the completion of the Seatunnel task, by persisting JAR > packages to local file systems or other distributed storage services such > as HDFS or S3. > > > > > The details of this feature design you guys can refer to [1]. > > > > > [1] https://github.com/apache/seatunnel/issues/5012 > > > > > Best wishes! > > Huan Liang
