Hi @Liugddx, I want to express my gratitude for raising these important questions.
All Jars associated with the current job are isolated through JobID, and all Jar packages required for the current job execution will be deleted at the end of the task execution. If there is a need to share Jar packages between different tasks, I believe it would be beneficial to assign a count attribute to each Jar package. This attribute can be used to indicate whether there are any unfinished jobs that require the Jar package file. After a job is executed, the count value of the corresponding Jar package will decrease by 1. The current Jar package file will only be cleared when there are no running jobs relying on it. The above is a simple solution that I have come up with. I would appreciate hearing your thoughts on this matter. Best regards, Huan Liang At 2023-07-04 12:21:24, "Guangdong Liu" <[email protected]> wrote: >Great proposal! I have a few questions to understand. > >1. If the same task is executed multiple times, will these jars be shared? >If a task ends, will it affect other tasks? > >2. Can we cache these jars? Maybe the next task doesn't need to load again. > > >Looking forward to your reply. >-- > >Best Regards > >------------ > >Liugddx >[email protected] > > >梁欢 <[email protected]> 于2023年7月3日周一 20:33写道: > >> Hello everyone,When the Zeta engine submits a job, the client first loads >> the connector plugin locally and saves the absolute path of the connector >> JAR package and the third-party JAR package that the connector runtime >> depends on (such as the database driver package) in the logical execution >> plan of the job. After submitting the task to the Zeta engine server, the >> server obtains the paths of the required JAR packages for each task from >> the logical execution plan. It then uses these paths to load the JAR >> packages from the server and execute them. >> >> >> >> >> However, this approach has two significant limitations: >> >> The server needs to have all connectors and their dependent JAR packages. >> >> The installation path of the client must be exactly the same as the >> server, and the installation path of Seatunnel Zeta in all nodes must also >> be the same. This leads to the engine side of SeaTunnel Zeta being >> relatively heavy, and the container volume becoming very large when >> performing Docker or Kubernetes (K8S) submission tasks. >> >> >> >> >> To address these limitations, we need to optimize the logic of the Zeta >> engine when executing tasks. The server should only have the core JAR >> package of the engine, while all connector packages should reside on the >> client side. When submitting tasks, the client should upload the required >> JAR package to the server instead of just keeping the path of the JAR >> package. When the server executes a job, it should download the required >> JAR package and then load it. Once the job is completed, the JAR package >> can be deleted. >> >> >> >> >> In Docker or K8S mode, there is currently no unified JAR package >> management service provided for project requirements. This includes JAR >> packages for connectors and JAR packages that connectors depend on. To >> reduce container volume, only the framework package of the Zeta engine >> needs to be included in the container image. The JAR package of the >> connector and the third-party JAR package that the connector depends on can >> be separately uploaded to the pod for distribution. Therefore, a component >> that supports the upload and download of all JAR package files must be >> implemented on the JobMaster side. The client that submits the task is >> responsible for uploading the connector's JAR package and the third-party >> JAR package files that the connector depends on to this component for >> unified management. All TaskExecutors deployed on different containers are >> responsible for downloading the required JAR packages from this component. >> The service components on the JobMaster side need to ensure reliable file >> management until the completion of the Seatunnel task, by persisting JAR >> packages to local file systems or other distributed storage services such >> as HDFS or S3. >> >> >> >> >> The details of this feature design you guys can refer to [1]. >> >> >> >> >> [1] https://github.com/apache/seatunnel/issues/5012 >> >> >> >> >> Best wishes! >> >> Huan Liang
