github-actions[bot] commented on issue #15120: URL: https://github.com/apache/dolphinscheduler/issues/15120#issuecomment-1792151271
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened dolphinscheduler version: 3.1.2 Value.xml has been configured for persistence. The spark-2.4.7-bin-hadoop2.7 and hadoop-2.7.0.tar environments are configured in the worker-Pod and the hadoop, yarn, hdfs and other sites in the external hadoop are copied. xml file Use dolphinscheduler3.1.2 deployed by helm to submit the task to the external hadoop cluster for scheduling. After stopping the task on the workflow instance page, the task on the external yarn continues to execute, and the workerPod reports an error: ``` Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 17 common frames omitted [INFO] 2023-11-03 15:52:14.679 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[238] - [WorkflowInstance-0][TaskInstance-111] - Get appIds from worker dolphinscheduler- worker-0.dolphinscheduler-worker-headless:1234 taskLogPath: /opt/dolphinscheduler/logs/20231103/10676729589025_23-107-111.log [INFO] 2023-11-03 15:52:14.679 +0800 org.apache.dolphinscheduler.service.log.LogClient:[208] - [WorkflowInstance-0][TaskInstance-111] - Begin to get appIds from worker: dolphinscheduler -worker-0.dolphinscheduler-worker-headless:1234 taskLogPath: /opt/dolphinscheduler/logs/20231103/10676729589025_23-107-111.log [INFO] 2023-11-03 15:52:14.680 +0800 org.apache.dolphinscheduler.plugin.task.api.utils.LogUtils:[66] - [WorkflowInstance-0][TaskInstance-111] - Find appId: application_1693365157704_0040 from /opt/dolphinscheduler/logs/20231103/10676729589025_23-107-111.log [INFO] 2023-11-03 15:52:14.680 +0800 org.apache.dolphinscheduler.service.log.LogClient:[222] - [WorkflowInstance-0][TaskInstance-111] - Get appIds: [application_1693365157704_0040] from worker : dolphinscheduler-worker-0.dolphinscheduler-worker-headless:1234 taskLogPath: /opt/dolphinscheduler/logs/20231103/10676729589025_23-107-111.log [INFO] 2023-11-03 15:52:14.686 +0800 org.apache.dolphinscheduler.service.utils.ProcessUtils:[96] - [WorkflowInstance-0][TaskInstance-111] - get kerberos init command [INFO] 2023-11-03 15:52:14.687 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[144] - [WorkflowInstance-0][TaskInstance-111] - kill cmd:sudo -u hdfs sh /tmp/dolphinscheduler/exec/process/hdfs/10667691377184/10676729589025_23/107/111/application_1693365157704_0040.kill [ERROR] 2023-11-03 15:52:14.696 +0800 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[147] - [WorkflowInstance-0][TaskInstance-111] - Kill yarn application app id [ application_1693365157704_0040] failed: [/tmp/dolphinscheduler/exec/process/hdfs/10667691377184/10676729589025_23/107/111/application_1693365157704_0040.kill: 4: source: not found /tmp/dolphinscheduler/exec/process/hdfs/10667691377184/10676729589025_23/107/111/application_1693365157704_0040.kill: 7: yarn: not `found` ```   The following is part of the configuration of value.xml. The IP and password have been omitted. conf: common: # user data local directory path, please make sure the directory exists and have read write permissions data.basedir.path: /tmp/dolphinscheduler # resource storage type: HDFS, S3, NONE resource.storage.type: S3 # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.storage.upload.base.path: /dolphinscheduler # whether to startup kerberos hadoop.security.authentication.startup.state: false # java.security.krb5.conf path java.security.krb5.conf.path: /opt/krb5.conf # login user from keytab username login.user.keytab.username: [email protected] # login user from keytab path login.user.keytab.path: /opt/hdfs.headless.keytab # kerberos expire time, the unit is hour kerberos.expire.time: 2 # resource view suffixs #resource.view.suffixs: txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path resource.hdfs.root.user: hdfs # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir resource.hdfs.fs.defaultFS: s3a://dolphinscheduler # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.access.key.id: admin # The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.secret.access.key: xxxxxxx # The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.region: cn-north-1 # The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name. resource.aws.s3.bucket.name: dolphinscheduler # You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com .cn resource.aws.s3.endpoint: http://10.200.x.xxx:9000 # resourcemanager port, the default value is 8088 if not specified resource.manager.httpaddress.port: 8088 # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty yarn.resourcemanager.ha.rm.ids: 192.168.xx.xx,192.168.xx.xx # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address: http://ndsc03.slave.com:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address: http://ndsc03.slave.com:19888/ws/v1/history/mapreduce/jobs/%s # datasource encryption enable datasource.encryption.enable: false # datasource encryption salt datasource.encryption.salt: '!@#$%^&*' #data quality option data-quality.jar.name: dolphinscheduler-data-quality-dev-SNAPSHOT.jar #data-quality.error.output.path: /tmp/data-quality-error-data # Network IP gets priority, default inner outer # Whether hive SQL is executed in the same session support.hive.oneSession: false # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions sudo.enable: true # network interface preferred like eth0, default: empty #dolphin.scheduler.network.interface.preferred: # network IP gets priority, default: inner outer #dolphin.scheduler.network.priority.strategy: default # system env path #dolphinscheduler.env.path: dolphinscheduler_env.sh # development state development.state: false # rpc port alert.rpc.port: 50052 # Url endpoint for zeppelin RESTful API zeppelin.rest.url: http://localhost:8080 ### What you expected to happen I think it's a problem with the backend interface ### How to reproduce Use helm to deploy dolphinscheduler, value.xml has been configured for persistence, the spark-2.4.7-bin-hadoop2.7 and hadoop-2.7.0.tar environments are configured in the worker-Pod and the hadoop and yarn in the external hadoop are copied. , hdfs and other site.xml files, define the workflow on the UI page, define a spark task, when the task is submitted to the external yarn, stop the task in the workflow instance, and go to the UI page of the yarn to observe whether the task is true. was killed, and then check the log in the worker's pod. kubectl logs -f dolphinscheduler-worker-0 --tail 500 -n namespace ### Anything else _No response_ ### Version 3.1.x ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
