I want to describe my idea.
First, we must design a host table that's named t_ds_ssh_host, e.g - id - name - ip / host - user - password - max_connection - create_time - update_time - user_id Second, Shell can execute in worker server or remote host. We can choose a host in host's list. Third, when workflow running, we must maintain the remote host's connection pool. Finally, when workflow finish, we must release the connection pool. Why do we have to maintain a connection pool. Because it is very easy to get exceptions when it's has many SSH task. ------------------ ???????? ------------------ ??????: "lidong dai"<[email protected]>; ????????: 2020??5??20??(??????) ????7:12 ??????: "dev"<[email protected]>; ????: Re: [Feature] Support SSH Task glad to hear that you will implement this feature Best Regards --------------- DolphinScheduler(Incubator) PPMC Lidong Dai ?????? [email protected] --------------- ?????? <[email protected]> ??2020??5??20?????? ????3:47?????? > My code is not perfect yet. I will write a detailed design document. Then > I will realize this feature about our discussion result. > > > ------------------&nbsp;????????&nbsp;------------------ > ??????:&nbsp;"wenhemin"<[email protected]&gt;; > ????????:&nbsp;2020??5??18??(??????) ????7:50 > ??????:&nbsp;"??????"<[email protected]&gt;;"dev"<[email protected] > &gt;; > > ????:&nbsp;Re: [Feature] Support SSH Task and Support dummy task like airflow > > > > Thanks for writing detailed documentation. I think this is also a missing > feature of DS. > About the extension point: > 1.Can ssh tasks be merged into shell tasks. Essentially, they all execute > shell commands. > 2.About dummy task, DS has the function of disable nodes, I do n??t know if > this requirement is met. > > The script from AirFlow to Dolphin is great. > > &gt; ?? 2020??5??18????09:28???????? <[email protected]&gt; ?????? > &gt; > &gt; > &gt; OK, 3Q! > &gt; > &gt; First, I will ensure that open source can use. > &gt; > &gt; Second, I think we must discuss deeply. I write a more detailed > document. You can check the attachment. I also send the document to > DaiLidong. > &gt; > &gt; Third,&nbsp; I'll give you the error of not using SSH connection pool. > &gt; > &gt; > &gt; > &gt; > &gt; ------------------ ???????? ------------------ > &gt; ??????: "wenhemin"<[email protected]&gt;; > &gt; ????????: 2020??5??14??(??????) ????7:26 > &gt; ??????: "??????"<[email protected]&gt;; > &gt; ????: Re: [Feature] Support SSH Task and Support dummy task like airflow > &gt; > &gt; Great! > &gt; I think, Can ssh tasks be merged into shell tasks,&nbsp; execute > script locally or remotely, Configure on the front end. > &gt; About ssh connect pool, I did not find it necessary to use the > connection pool. > &gt; > &gt; BTW, Look at the code to introduce additional jar packages, You also > need to ensure that open source can use the license of this jar package. > &gt; > &gt;&gt; ?? 2020??5??14????16:20???????? <[email protected] <mailto:[email protected]&gt;&gt; > ?????? > &gt;&gt; > &gt;&gt; > &gt;&gt; 1. The priority between these tasks is also depended on the > dolphin DAG define. When the front task is not finished, it not execute > next task. > &gt;&gt; > &gt;&gt; 2. I extend ssh task. I also use local params to config ssh host, > user and password. > &gt;&gt; > &gt;&gt; E.g: > &gt;&gt; public static AbstractTask newTask(TaskExecutionContext > taskExecutionContext, Logger logger) > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; throws IllegalArgumentException { > &gt;&gt;&nbsp;&nbsp; Boolean enable = > JSONUtils.parseObject(taskExecutionContext.getTaskParams()).getBoolean("enable"); > &gt;&gt;&nbsp;&nbsp; if (enable != null &amp;&amp; enable == false ) { > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; return new > DummyTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp; } > &gt;&gt;&nbsp;&nbsp; switch > (EnumUtils.getEnum(TaskType.class,taskExecutionContext.getTaskType())) { > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case SHELL: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > ShellTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case PROCEDURE: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > ProcedureTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case SQL: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > SqlTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case MR: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > MapReduceTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case SPARK: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > SparkTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case FLINK: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > FlinkTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case PYTHON: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > PythonTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case HTTP: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > HttpTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case DATAX: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > DataxTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case SQOOP: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > SqoopTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; case SSH: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return new > SSHTask(taskExecutionContext, logger); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp; default: > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; logger.error("unsupport task > type: {}", taskExecutionContext.getTaskType()); > &gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; throw new > IllegalArgumentException("not support task type"); > &gt;&gt;&nbsp;&nbsp; } > &gt;&gt; } > &gt;&gt; 3. I am not sure that it supports window or not. > &gt;&gt; > &gt;&gt; > &gt;&gt; > &gt;&gt; ------------------ ???????? ------------------ > &gt;&gt; ??????: "wenhemin"<[email protected] <mailto:[email protected]&gt;&gt;; > &gt;&gt; ????????: 2020??5??14??(??????) ????3:46 > &gt;&gt; ??????: "??????"<[email protected] <mailto:[email protected]&gt;&gt;; > &gt;&gt; ????: Re: [Feature] Support SSH Task and Support dummy task like > airflow > &gt;&gt; > &gt;&gt; Sorry, My previous description is not very clear. > &gt;&gt; > &gt;&gt; I want to ask some questions: > &gt;&gt; 1.How to control the priority between ssh tasks? There may be > some ssh tasks that have been waiting for execution. > &gt;&gt; 2.I understand what you want to solve is the problem of executing > remote ssh scripts in batches. > &gt;&gt;&nbsp;&nbsp; So, not sure how to use this function. > &gt;&gt; 3.I don't know if this supports windows system. > &gt;&gt; > &gt;&gt;&gt; ?? 2020??5??13????20:56???????? <[email protected] <mailto: > [email protected]&gt;&gt; ?????? > &gt;&gt;&gt; > &gt;&gt;&gt; > &gt;&gt;&gt; I use spin lock. Here is my code. Of course , it's not > perfect. I just do a test. To my surprise, it is the result of the > execution is the same as the AirFlow > &gt;&gt;&gt; > &gt;&gt;&gt; ???????????????????????????????????????????????????????????????????????????????????????????????????????? AirFlow > ???????????? > &gt;&gt;&gt; > &gt;&gt;&gt; > &gt;&gt;&gt; > &gt;&gt;&gt; > &gt;&gt;&gt; ------------------ ???????? ------------------ > &gt;&gt;&gt; ??????: "whm_777"<[email protected] <mailto:[email protected] > &gt;&gt;; > &gt;&gt;&gt; ????????: 2020??5??13??(??????) ????7:21 > &gt;&gt;&gt; ??????: "??????"<[email protected] <mailto:[email protected]&gt;&gt;; > &gt;&gt;&gt; ????: Re: [Feature] Support SSH Task and Support dummy task > like airflow > &gt;&gt;&gt; > &gt;&gt;&gt; You can modify the maximum number of linux ssh connections. > &gt;&gt;&gt; If use ssh connection pool, How to control the priority of > ssh? > &gt;&gt;&gt; > &gt;&gt;&gt;&gt; ?? 2020??5??13????18:01???????? <[email protected] <mailto: > [email protected]&gt;&gt; ?????? > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; First 3Q?? > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; I&nbsp; use more than 100 task node. But SSH connections > are limited. > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; > ??????????100??????????????????????SSH????????????????????????????????????????????????SSH??????????????????????????????DAG??????AirFlow???????????? > &gt;&gt;&gt;&gt; <[email protected]&gt; > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; ------------------ ???????? ------------------ > &gt;&gt;&gt;&gt; ??????: "whm_777"<[email protected] <mailto:[email protected] > &gt;&gt;; > &gt;&gt;&gt;&gt; ????????: 2020??5??13??(??????) ????5:50 > &gt;&gt;&gt;&gt; ??????: "??????"<[email protected] <mailto:[email protected] > &gt;&gt;; > &gt;&gt;&gt;&gt; ????: Re: [Feature] Support SSH Task and Support dummy task > like airflow > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; E.g. > &gt;&gt;&gt;&gt; rtn_code=`ssh -o ServerAliveInterval=60 -p xxxx > [email protected] <mailto:[email protected]&gt; ??shell > command&nbsp; &gt;/dev/null 2&gt;&amp;1; echo $?'` > &gt;&gt;&gt;&gt; if [ "$rtn_code" -eq 0 ]; then > &gt;&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; echo "????" > &gt;&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; exit 0 > &gt;&gt;&gt;&gt; else > &gt;&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; echo "????" > &gt;&gt;&gt;&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; exit 1 > &gt;&gt;&gt;&gt; fi > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; Batch shell command is not supported. > &gt;&gt;&gt;&gt; Multiple servers can be split into multiple task nodes. > &gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; ?? 2020??5??13????17:40???????? <[email protected] <mailto: > [email protected]&gt;&gt; ?????? > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; Could you give me a example??3Q. ???????????????????????? > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; By the way, I have more than 100 tasks in one DAG. > These tasks connect two other server to execute. So SSH tasks must have > pool to manager. Now I use JSch and realize a simple pool. > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; ??????????????????????????????????100???? SSH ?????????????????????????????????????????????????? > SSH ?????????????????????????????????????????????????? JSch???????????????????????????? > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; ------------------ ???????? ------------------ > &gt;&gt;&gt;&gt;&gt; ??????: "wenhemin"<[email protected] <mailto: > [email protected]&gt;&gt;; > &gt;&gt;&gt;&gt;&gt; ????????: 2020??5??13??(??????) ????5:24 > &gt;&gt;&gt;&gt;&gt; ??????: "dev"<[email protected] <mailto: > [email protected]&gt;&gt;; > &gt;&gt;&gt;&gt;&gt; ????: Re: [Feature] Support SSH Task and Support dummy > task like airflow > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; The shell node is supports remote calling, and get > the remote command result code. > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt;&gt; &gt; ?? 2020??5??13????15:16???????? <[email protected] > <mailto:[email protected]&gt;&gt; ?????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Dear ALL?? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Support Linux SSH Task ???? Linux SSH ???? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; ???????????????????????????????????????????????????????? Shell ??????Shell > ???????????????????????????????????? Worker ???????????????????????????????????????????????????? Shell > ?????????????????????????????????????????????????????????????????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; For example, in my project, the workflow's tasks > want to execute shell scripts where are in different server's different > directory. When worker execute these shell scripts, it must use the same > user to login these server. Also, the worker can get the executing state of > these server. We can config these server 's host,user and password. > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; SSH Task is very useful for most user SSH > ???????????????????????????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; ?????????????????????? Shell > ???????????????????????????????????????????????????????????????? Worker?????????? Worker > ?????????????????????????????????????????????????????????????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; In dolphinscheduler, the most executing tasks > are in different servers who are not workers. These servers also have their > different fixed services. We just have to pass different parameters to > schedule these shell scripts to execute. > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Python has a module to execute ssh script Python > ??????????????????????????SSH Shell ???? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Python ??????????????????????SSH Shell??????????????????????paramiko?? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Python has a module that can execute SSH Shell > script. It's paramiko. > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Others ???????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; ???????????????????????????????????????????????????????????????????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; I found this described in previous feature, but > it was relatively simple. > &gt;&gt;&gt;&gt;&gt; &gt; Feature URL > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; ???????????? Shell Task > ?????????????????????????????????????????????????????????????????????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; In addition, it is very inconvenient for me to > perform remote tasks through Shell Task. Here is my script. I don't know if > there's a better way. > &gt;&gt;&gt;&gt;&gt; &gt; sshpass -p 'password' ssh user@host echo 'ssh > success' echo 'Hello World' -&amp;gt; /home/dolphinscheduler/test/hello.txt > echo 'end' > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Support dummy task like airflow ?????? Airflow > ???????????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; ???????????????????????????????? DAG ??????DAG > ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? Dummy > Task????????????????????????????????????????????????????????????????????????AirFlow??????????????DummyOperator???????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; For example, in my project, it has a productized > DAG file. The file contains different modules, some of which are > interdependent and some of which are not. When customers purchase different > modules, we need to set some tasks as dummy tasks, which some modules are > not purchased and the purchased module is not dependent. Because of this > setting, these dummy tasks are actually not executed. The benefits of this > setup are product unity and diagram integrity. In airflow, these task > execute by dummy operator. > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; ** Realize ????????** > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Dummy Task ???????????????????????????????????????????????????????????????? > dummy ?????????????????????????? Dummy Task?? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; Dummy Task is easy to realize, but it need to > use with other different tasks. When the task's executed type is set to > dummy type, the task are executed as a dummy task and the real task is not > executed. > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; > ????????????????????????????????????Fork?????????????????????????????????????????????????????????? > &gt;&gt;&gt;&gt;&gt; &gt; > &gt;&gt;&gt;&gt;&gt; &gt; By the way??I already realize these two&amp;nbsp; > features in my fork branch.&amp;nbsp;Whether the follow-up release can be > supported > &gt;&gt;&gt;&gt;&gt; > &gt;&gt;&gt;&gt; > &gt;&gt;&gt; > &gt;&gt;&gt; <SSHClient.java&gt;<SSHPool.java&gt;<SSHTask.java&gt; > &gt;&gt; > &gt; > &gt; <??????????????Dolphin????????????.pdf&gt;
