[GitHub] [incubator-dolphinscheduler-website] AhahaGe commented on a change in pull request #190: [Feature][Docs] Add English version of "system-manual" to the documentation

GitBox Wed, 14 Oct 2020 04:04:02 -0700


AhahaGe commented on a change in pull request #190:
URL: 
https://github.com/apache/incubator-dolphinscheduler-website/pull/190#discussion_r504589079




##########
File path: docs/en-us/1.3.2/user_doc/system-manual.md
##########
@@ -0,0 +1,886 @@
+# System User Manual
+
+## Get started quickly
+
+> Please refer to [Quick Start](quick-start.html)
+
+## Operation guide
+
+### 1. Home
+
+The home page contains task status statistics, process status statistics, and 
workflow definition statistics for all projects of the user.
+
+<p align="center">
+<img src="/img/home_en.png" width="80%" />
+</p>
+
+### 2. Project management
+
+#### 2.1 Create project
+
+- Click "Project Management" to enter the project management page, click the 
"Create Project" button, enter the project name, project description, and click 
"Submit" to create a new project.
+
+  <p align="center">
+      <img src="/img/create_project_en1.png" width="80%" />
+  </p>
+
+#### 2.2 Project home
+
+- Click the project name link on the project management page to enter the 
project home page, as shown in the figure below, the project home page contains 
the task status statistics, process status statistics, and workflow definition 
statistics of the project.
+  <p align="center">
+     <img src="/img/project_home_en.png" width="80%" />
+  </p>
+
+- Task status statistics: within the specified time range, count the number of 
task instances as successful submission, running, ready to pause, pause, ready 
to stop, stop, failure, success, fault tolerance, kill, and waiting threads
+- Process status statistics: within the specified time range, count the number 
of the status of the workflow instance as submission success, running, ready to 
pause, pause, ready to stop, stop, failure, success, fault tolerance, kill, and 
waiting threads
+- Workflow definition statistics: Count the workflow definitions created by 
this user and the workflow definitions granted to this user by the administrator
+
+#### 2.3 Workflow definition
+
+#### <span id=creatDag>2.3.1 Create workflow definition</span>
+
+- Click Project Management -> Workflow -> Workflow Definition to enter the 
workflow definition page, and click the "Create Workflow" button to enter the 
**workflow DAG edit** page, as shown in the following figure:
+  <p align="center">
+      <img src="/img/dag5.png" width="80%" />
+  </p>
+- Drag in the toolbar <img src="/img/shell.png" width="35"/> Add a Shell task 
to the drawing board, as shown in the figure below:
+  <p align="center">
+      <img src="/img/shell-en.png" width="80%" />
+  </p>
+- **Add parameter settings for this shell task:**
+
+1. Fill in the "Node Name", "Description", and "Script" fields;
+2. Check “Normal” for “Run Flag”. If “Prohibit Execution” is checked, the task 
will not be executed when the workflow runs;
+3. Select "Task Priority": When the number of worker threads is insufficient, 
high-level tasks will be executed first in the execution queue, and tasks with 
the same priority will be executed in the order of first in, first out;
+4. Timeout alarm (optional): Check the timeout alarm, timeout failure, and 
fill in the "timeout period". When the task execution time exceeds **timeout 
period**, an alert email will be sent and the task timeout fails;
+5. Resources (optional). Resource files are files created or uploaded on the 
Resource Center -> File Management page. For example, the file name is 
`test.sh`, and the command to call the resource in the script is `sh test.sh`;
+6. Custom parameters (optional), refer to [Custom 
Parameters](#UserDefinedParameters);
+7. Click the "Confirm Add" button to save the task settings.
+
+- **Increase the order of task execution:** Click the icon in the upper right 
corner <img src="/img/line.png" width="35"/> to connect the task; as shown in 
the figure below, task 2 and task 3 are executed in parallel, When task 1 
finished execute, tasks 2 and 3 will be executed simultaneously.
+
+  <p align="center">
+     <img src="/img/dag6.png" width="80%" />
+  </p>
+
+- **Delete dependencies:** Click the "arrow" icon in the upper right corner 
<img src="/img/arrow.png" width="35"/>, select the connection line, and click 
the "Delete" icon in the upper right corner <img src= "/img/delete.png" 
width="35"/>, delete dependencies between tasks.
+  <p align="center">
+     <img src="/img/dag7.png" width="80%" />
+  </p>
+
+- **Save workflow definition:** Click the "Save" button, and the "Set DAG 
chart name" pop-up box will pop up, as shown in the figure below. Enter the 
workflow definition name, workflow definition description, and set global 
parameters (optional, refer to [ Custom parameters](#UserDefinedParameters)), 
click the "Add" button, and the workflow definition is created successfully.
+  <p align="center">
+     <img src="/img/dag8.png" width="80%" />
+   </p>
+> For other types of tasks, please refer to [Task Node Type and Parameter 
Settings](#TaskParamers).
+
+#### 2.3.2 Workflow definition operation function
+
+Click Project Management -> Workflow -> Workflow Definition to enter the 
workflow definition page, as shown below:
+
+<p align="center">
+<img src="/img/work_list_en.png" width="80%" />
+</p>
+The operation functions of the workflow definition list are as follows:
+
+- **Edit:** Only "offline" workflow definitions can be edited. Workflow DAG 
editing is the same as [Create Workflow Definition](#creatDag).
+- **Online:** When the workflow status is "Offline", used to online workflow. 
Only the workflow in the "Online" state can run, but cannot be edited.
+- **Offline:** When the workflow status is "Online", used to offline workflow. 
Only the workflow in the "Offline" state can be edited, but not run.
+- **Run:** Only workflow in the online state can run. See [2.3.3 Run 
Workflow](#runWorkflow) for the operation steps
+- **Timing:** Timing can only be set in online workflows, and the system 
automatically schedules the workflow to run on a regular basis. The status 
after creating a timing is "offline", and the timing must be online on the 
timing management page to take effect. See [2.3.4 Workflow 
Timing](#creatTiming) for timing operation steps.
+- **Timing Management:** The timing management page can be edited, 
online/offline, and deleted.
+- **Delete:** Delete the workflow definition.
+- **Download:** Download workflow definition to local.
+- **Tree Diagram:** Display the task node type and task status in a tree 
structure, as shown in the figure below:
+  <p align="center">
+      <img src="/img/tree_en.png" width="80%" />
+  </p>
+
+#### <span id=runWorkflow>2.3.3 Run the workflow</span>
+
+- Click Project Management -> Workflow -> Workflow Definition to enter the 
workflow definition page, as shown in the figure below, click the "Go Online" 
button <img src="/img/online.png" width="35"/>，Go online workflow.
+  <p align="center">
+      <img src="/img/work_list_en.png" width="80%" />
+  </p>
+
+- Click the "Run" button to pop up the startup parameter setting pop-up box, 
as shown in the figure below, set the startup parameters, click the "Run" 
button in the pop-up box, the workflow starts running, and the workflow 
instance page generates a workflow instance.
+     <p align="center">
+       <img src="/img/run_work_en.png" width="80%" />
+     </p>  
+  <span id=runParamers>Description of workflow operating parameters:</span> 
+       
+      * Failure strategy: When a task node fails to execute, other parallel 
task nodes need to execute the strategy. "Continue" means: after a certain task 
fails, other task nodes execute normally; "End" means: terminate all tasks 
being executed, and terminate the entire process.
+      * Notification strategy: When the process is over, the process execution 
information notification email is sent according to the process status, 
including any status is not sent, successful sent, failed sent, successful or 
failed sent.
+      * Process priority: The priority of process operation, divided into five 
levels: highest (HIGHEST), high (HIGH), medium (MEDIUM), low (LOW), and lowest 
(LOWEST). When the number of master threads is insufficient, high-level 
processes will be executed first in the execution queue, and processes with the 
same priority will be executed in a first-in first-out order.
+      * Worker group: The process can only be executed in the specified worker 
machine group. The default is Default, which can be executed on any worker.
+      * Notification group: select notification strategy||timeout alarm||when 
fault tolerance occurs, process information or email will be sent to all 
members in the notification group.
+      * Recipient: Select notification policy||Timeout alarm||When fault 
tolerance occurs, process information or alarm email will be sent to the 
recipient list.
+      * Cc: Select the notification strategy||Timeout alarm||When fault 
tolerance occurs, the process information or warning email will be copied to 
the CC list.
+      * Complement: Two modes including serial complement and parallel 
complement. Serial complement: within the specified time range, the complement 
is executed sequentially from the start date to the end date, and only one 
process instance is generated; parallel complement: within the specified time 
range, multiple days are complemented at the same time to generate N process 
instances.
+    * For example, you need to fill in the data from May 1 to May 10.
+
+    <p align="center">
+        <img src="/img/complement_en1.png" width="80%" />
+    </p>
+
+  > Serial mode: The complement is executed sequentially from May 1 to May 10, 
and a process instance is generated on the process instance page;
+
+  > Parallel mode: The tasks from May 1 to may 10 are executed simultaneously, 
and 10 process instances are generated on the process instance page.
+
+#### <span id=creatTiming>2.3.4 Workflow timing</span>
+
+- Create timing: Click Project Management->Workflow->Workflow Definition, 
enter the workflow definition page, go online the workflow, click the "timing" 
button <img src="/img/timing.png" width="35"/> ,The timing parameter setting 
dialog box pops up, as shown in the figure below:
+  <p align="center">
+      <img src="/img/time_schedule_en.png" width="80%" />
+  </p>
+- Choose the start and end time. In the start and end time range, the workflow 
is run at regular intervals; not in the start and end time range, no more 
regular workflow instances are generated.
+- Add a timing that is executed once every day at 5 AM, as shown in the 
following figure:
+  <p align="center">
+      <img src="/img/timer-en.png" width="80%" />
+  </p>
+- Failure strategy, notification strategy, process priority, worker group, 
notification group, recipient, and CC are the same as [workflow running 
parameters](#runParamers).
+- Click the "Create" button to create the timing successfully. At this time, 
the timing status is "**Offline**" and the timing needs to be **Online** to 
take effect.
+- Timing online: Click the "timing management" button <img 
src="/img/timeManagement.png" width="35"/>, enter the timing management page, 
click the "online" button, the timing status will change to "online", as shown 
in the below figure, the workflow takes effect regularly.
+  <p align="center">
+      <img src="/img/time-manage-list-en.png" width="80%" />
+  </p>
+
+#### 2.3.5 Import workflow
+
+Click Project Management -> Workflow -> Workflow Definition to enter the 
workflow definition page, click the "Import Workflow" button to import the 
local workflow file, the workflow definition list displays the imported 
workflow, and the status is offline.
+
+#### 2.4 Workflow instance
+
+#### 2.4.1 View workflow instance
+
+- Click Project Management -> Workflow -> Workflow Instance to enter the 
Workflow Instance page, as shown in the figure below:
+     <p align="center">
+        <img src="/img/instance-list-en.png" width="80%" />
+     </p>
+- Click the workflow name to enter the DAG view page to view the task 
execution status, as shown in the figure below.
+  <p align="center">
+    <img src="/img/instance-runs-en.png" width="80%" />
+  </p>
+
+#### 2.4.2 View task log
+
+- Enter the workflow instance page, click the workflow name, enter the DAG 
view page, double-click the task node, as shown in the following figure:
+   <p align="center">
+     <img src="/img/instanceViewLog-en.png" width="80%" />
+   </p>
+- Click "View Log", a log pop-up box will pop up, as shown in the figure 
below, the task log can also be viewed on the task instance page, refer to 
[Task View Log](#taskLog)。
+   <p align="center">
+     <img src="/img/task-log-en.png" width="80%" />
+   </p>
+
+#### 2.4.3 View task history
+
+- Click Project Management -> Workflow -> Workflow Instance to enter the 
workflow instance page, and click the workflow name to enter the workflow DAG 
page;
+- Double-click the task node, as shown in the figure below, click "View 
History" to jump to the task instance page, and display a list of task 
instances running by the workflow instance
+   <p align="center">
+     <img src="/img/task_history_en.png" width="80%" />
+   </p>
+
+#### 2.4.4 View operating parameters
+
+- Click Project Management -> Workflow -> Workflow Instance to enter the 
workflow instance page, and click the workflow name to enter the workflow DAG 
page;
+- Click the icon in the upper left corner <img 
src="/img/run_params_button.png" width="35"/>，View the startup parameters of 
the workflow instance; click the icon <img src="/img/global_param.png" 
width="35"/>，View the global and local parameters of the workflow instance, as 
shown in the following figure:
+   <p align="center">
+     <img src="/img/run_params_en.png" width="80%" />
+   </p>
+
+#### 2.4.4 Workflow instance operation function
+
+Click Project Management -> Workflow -> Workflow Instance to enter the 
Workflow Instance page, as shown in the figure below:
+
+  <p align="center">
+    <img src="/img/instance-list-en.png" width="80%" />
+  </p>
+
+- **Edit：** Only terminated processes can be edited. Click the "Edit" button 
or the name of the workflow instance to enter the DAG edit page. After edit, 
click the "Save" button to pop up the Save DAG pop-up box, as shown in the 
figure below. In the pop-up box, check "Whether to update to workflow 
definition" and save After that, the workflow definition will be updated; if it 
is not checked, the workflow definition will not be updated.
+     <p align="center">
+       <img src="/img/editDag-en.png" width="80%" />
+     </p>
+- **Rerun：** Re-execute the terminated process.
+- **Recovery failed：** For failed processes, you can perform recovery 
operations, starting from the failed node.
+- **Stop：** To **stop** the running process, the background will first 
`kill`worker process, and then execute `kill -9` operation
+- **Pause:** Perform a **pause** operation on the running process, the system 
status will change to **waiting for execution**, it will wait for the end of 
the task being executed, and pause the next task to be executed.
+- **Resume pause:** To resume the paused process, start running directly from 
the **paused node**
+- **Delete:** Delete the workflow instance and the task instance under the 
workflow instance
+- **Gantt chart:** The vertical axis of the Gantt chart is the topological 
sorting of task instances under a certain workflow instance, and the horizontal 
axis is the running time of the task instances, as shown in the figure:
+     <p align="center">
+         <img src="/img/gantt-en.png" width="80%" />
+     </p>
+
+#### 2.5 Task instance
+
+- Click Project Management -> Workflow -> Task Instance to enter the task 
instance page, as shown in the figure below, click the name of the workflow 
instance, you can jump to the workflow instance DAG chart to view the task 
status.
+     <p align="center">
+        <img src="/img/task-list-en.png" width="80%" />
+     </p>
+
+- <span id=taskLog>View log：</span>Click the "view log" button in the 
operation column to view the log of task execution.
+     <p align="center">
+        <img src="/img/task-log2-en.png" width="80%" />
+     </p>
+
+### 3. Resource Center
+
+#### 3.1 hdfs resource configuration
+
+- Upload resource files and udf functions, all uploaded files and resources 
will be stored on hdfs, so the following configuration items are required:
+
+```
+
+conf/common/common.properties
+    # Users who have permission to create directories under the HDFS root path
+    hdfs.root.user=hdfs
+    # data base dir, resource file will store to this hadoop hdfs path, self 
configuration, please make sure the directory exists on hdfs and have read 
write permissions。"/escheduler" is recommended
+    data.store2hdfs.basepath=/dolphinscheduler
+    # resource upload startup type : HDFS,S3,NONE
+    res.upload.startup.type=HDFS
+    # whether kerberos starts
+    hadoop.security.authentication.startup.state=false
+    # java.security.krb5.conf path
+    java.security.krb5.conf.path=/opt/krb5.conf
+    # loginUserFromKeytab user
+    login.user.keytab.username=hdfs-myclus...@esz.com
+    # loginUserFromKeytab path
+    login.user.keytab.path=/opt/hdfs.headless.keytab
+
+conf/common/hadoop.properties
+    # ha or single namenode,If namenode ha needs to copy core-site.xml and 
hdfs-site.xml
+    # to the conf directory，support s3，for example : s3a://dolphinscheduler
+    fs.defaultFS=hdfs://mycluster:8020
+    #resourcemanager ha note this need ips , this empty if single
+    yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
+    # If it is a single resourcemanager, you only need to configure one host 
name. If it is resourcemanager HA, the default configuration is fine
+    yarn.application.status.address=http://xxxx:8088/ws/v1/cluster/apps/%s
+
+```
+
+- Only one address needs to be configured for yarn.resourcemanager.ha.rm.ids 
and yarn.application.status.address, and the other address is empty.
+- You need to copy core-site.xml and hdfs-site.xml from the conf directory of 
the Hadoop cluster to the conf directory of the dolphinscheduler project, and 
restart the api-server service.
+
+#### 3.2 File management
+
+> It is the management of various resource files, including creating basic 
txt/log/sh/conf/py/java and other files, uploading jar packages and other types 
of files, and can do edit, rename, download, delete and other operations.
+
+  <p align="center">
+   <img src="/img/file-manage-en.png" width="80%" />
+ </p>
+
+- Create a file
+  > The file format supports the following types: txt, log, sh, conf, cfg, py, 
java, sql, xml, hql, properties
+
+<p align="center">
+   <img src="/img/file_create_en.png" width="80%" />
+ </p>
+
+- upload files
+
+> Upload file: Click the "Upload File" button to upload, drag the file to the 
upload area, the file name will be automatically completed with the uploaded 
file name
+
+<p align="center">
+   <img src="/img/file-upload-en.png" width="80%" />
+ </p>
+
+- File View
+
+> For the file types that can be viewed, click the file name to view the file 
details
+
+<p align="center">
+   <img src="/img/file_detail_en.png" width="80%" />
+ </p>
+
+- download file
+
+> Click the "Download" button in the file list to download the file or click 
the "Download" button in the upper right corner of the file details to download 
the file
+
+- File rename
+
+<p align="center">
+   <img src="/img/file_rename_en.png" width="80%" />
+ </p>
+
+- delete
+  > File list -> Click the "Delete" button to delete the specified file
+
+#### 3.3 UDF management
+
+#### 3.3.1 Resource management
+
+> The resource management and file management functions are similar. The 
difference is that the resource management is the uploaded UDF function, and 
the file management uploads the user program, script and configuration file.
+> Operation function: rename, download, delete.
+
+- Upload udf resources
+  > Same as uploading files.
+
+#### 3.3.2 Function management
+
+- Create UDF function
+  > Click "Create UDF Function", enter the udf function parameters, select the 
udf resource, and click "Submit" to create the udf function.
+
+> Currently only supports temporary UDF functions of HIVE
+
+- UDF function name: the name when the UDF function is entered
+- Package name Class name: Enter the full path of the UDF function
+- UDF resource: Set the resource file corresponding to the created UDF
+
+<p align="center">
+   <img src="/img/udf_edit_en.png" width="80%" />
+ </p>
+
+### 4. Create data source
+
+> Data source center supports MySQL, POSTGRESQL, HIVE/IMPALA, SPARK, 
CLICKHOUSE, ORACLE, SQLSERVER and other data sources
+
+#### 4.1 Create/Edit MySQL data source
+
+- Click "Data Source Center -> Create Data Source" to create different types 
of data sources according to requirements.
+
+- Data source: select MYSQL
+- Data source name: enter the name of the data source
+- Description: Enter a description of the data source
+- IP hostname: enter the IP to connect to MySQL
+- Port: Enter the port to connect to MySQL
+- Username: Set the username for connecting to MySQL
+- Password: Set the password for connecting to MySQL
+- Database name: Enter the name of the database connected to MySQL
+- Jdbc connection parameters: parameter settings for MySQL connection, filled 
in in JSON form
+
+<p align="center">
+   <img src="/img/mysql-en.png" width="80%" />
+ </p>
+
+> Click "Test Connection" to test whether the data source can be successfully 
connected.
+
+#### 4.2 Create/Edit POSTGRESQL data source
+
+- Data source: select POSTGRESQL
+- Data source name: enter the name of the data source
+- Description: Enter a description of the data source
+- IP/Host Name: Enter the IP to connect to POSTGRESQL
+- Port: Enter the port to connect to POSTGRESQL
+- Username: Set the username for connecting to POSTGRESQL
+- Password: Set the password for connecting to POSTGRESQL
+- Database name: Enter the name of the database connected to POSTGRESQL
+- Jdbc connection parameters: parameter settings for POSTGRESQL connection, 
filled in in JSON form
+
+<p align="center">
+   <img src="/img/postgresql-en.png" width="80%" />
+ </p>
+
+#### 4.3 Create/Edit HIVE data source
+
+1.Use HiveServer2 to connect
+
+ <p align="center">
+    <img src="/img/hive-en.png" width="80%" />
+  </p>
+
+- Data source: select HIVE
+- Data source name: enter the name of the data source
+- Description: Enter a description of the data source
+- IP/Host Name: Enter the IP connected to HIVE
+- Port: Enter the port connected to HIVE
+- Username: Set the username for connecting to HIVE
+- Password: Set the password for connecting to HIVE
+- Database name: Enter the name of the database connected to HIVE
+- Jdbc connection parameters: parameter settings for HIVE connection, filled 
in in JSON form
+
+  2.Use HiveServer2 HA Zookeeper to connect
+
+ <p align="center">
+    <img src="/img/hive1-en.png" width="80%" />
+  </p>
+
+Note: If you enable **kerberos**, you need to fill in **Principal**
+
+<p align="center">
+    <img src="/img/hive-en.png" width="80%" />
+  </p>
+
+#### 4.4 Create/Edit Spark data source
+
+<p align="center">
+   <img src="/img/spark-en.png" width="80%" />
+ </p>
+
+- Data source: select Spark
+- Data source name: enter the name of the data source
+- Description: Enter a description of the data source
+- IP/Hostname: Enter the IP connected to Spark
+- Port: Enter the port connected to Spark
+- Username: Set the username for connecting to Spark
+- Password: Set the password for connecting to Spark
+- Database name: Enter the name of the database connected to Spark
+- Jdbc connection parameters: parameter settings for Spark connection, filled 
in in JSON form
+
+### 5. Security Center (Permission System)
+
+     * Only the administrator account in the security center has the authority 
to operate. It has functions such as queue management, tenant management, user 
management, alarm group management, worker group management, token management, 
etc. In the user management module, resources, data sources, projects, etc. 
Authorization

Review comment:
       ```suggestion
        * Only the administrator account has the authority to operate security 
center. It has functions such as queue management, tenant management, user 
management, alarm group management, worker group management, token management 
etc. We can authorize resources, data sources, projects in the user management 
module.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-dolphinscheduler-website] AhahaGe commented on a change in pull request #190: [Feature][Docs] Add English version of "system-manual" to the documentation

Reply via email to