Not a good idea. 感觉太复杂了,还是等大json拆分之后再说吧。而且现在只有SQL节点,并不适合用来解析依赖关系,SQLScript才适合! 依赖关系的配置是一个比较麻烦的事情,自动创建在现阶段的优先级并不太高。
From: Hemin Wen <[email protected]> Date: Wednesday, October 28, 2020 at 11:49 To: dev <[email protected]> Subject: [DISCUSS] Table lineage design Hi! The function of table lineage automatic dependency configuration, welcome everyone to discuss my ideas. ## 1. Demand background Currently, DS can only use DAG drawing to set up the workflow/node dependency, or call the API to create the workflow and dependency based on the data structure of the workflow. The data warehouse is generally hierarchical design, the data production process is link type, there are complex dependencies between layers, and there are many SQL scripts. Manually creating dependencies is inconvenient for the maintenance of large-scale workflows, and dependency configuration errors are not convenient for troubleshooting. It is possible to extract the table blood relationship by analyzing the SQL statements in the SQL related nodes, and then automatically establish the dependency relationship according to the table blood relationship. The Master Server executes the workflow according to the supplemented dependencies to ensure that the nodes execute in the order of dependencies. ## 2. Design Ideas - Analyze SQL table blood relationship when saving workflow, and automatically generate dependent configuration data (only for SQL related nodes) - Master Server automatically resolves dependencies based on nodes, generates dependent nodes, and executes all node tasks - The front-end node configuration page adds the "Automatically resolve dependencies" switch to control whether to enable dependency detection during execution of the node - A dependency graph page is added to the front end for easy viewing of node dependencies after automatic analysis Insufficient: - In the current design, the automatically generated default rule for dependent nodes only supports judging whether the task status of the node on the day is successful. The fixed configuration is checked every N minutes for a total of M times. If the number is exceeded, it will be treated as a failure. ## 3. Timing diagram Please refer to the picture below ## 4. Table Design Add node lineage relationship table: t_ds_node_lineage | Column Name | Description | | --------------------- | ------------------------| | id | Auto-incrementing ID | | process_definition_id | Workflow definition ID | | process_node_id | Workflow node ID | | lineage_type | Lineage type (1 input, 2 output) | | lineage_union_key | Lineage only KEY | | create_time | Creation time | ------------------------------------------------------------------------------------------------------------------------------------------------------------- ## 1.需求背景 当前DS只能通过DAG画图设置工作流/节点间依赖关系,或者根据工作流的数据结构调用API创建工作流及依赖关系。 而数仓一般是分层设计,数据的生产过程是链路式的,层与层之间存在复杂的依赖关系,SQL脚本众多。 手工创建依赖关系不便于大批量工作流的维护,依赖配置错误不方便排查。 可以通过解析SQL相关节点中的SQL语句,抽取表血缘关系,再根据表血缘关系自动建立依赖关系。 Master Server根据补充后的依赖关系执行工作流,保证节点按照依赖顺序执行。 ## 2.设计思路 - 保存工作流时解析SQL的表血缘关系,自动生成依赖配置数据(仅限于SQL相关节点) - Master Server根据节点自动解析依赖关系,生成依赖节点,执行所有节点任务 - 前端节点配置页面增加“自动解析依赖”开关,控制节点在执行时是否启用依赖检测 - 前端增加依赖图页面,方便查看自动解析后的节点依赖关系 不足: - 当前设计中,自动生成的依赖节点默认规则仅支持判断当日节点任务状态是否成功,固定配置每隔N分钟检查一次,共检查M次,超过次数后作为失败处理 ## 3.时序图 [cid:ii_kgsus5mg0] [cid:ii_kgsusdlj1] ## 4.表设计 新增节点血缘关系表:t_ds_node_lineage | 列名 | 描述 | | --------------------- | ------------------------| | id | 自增ID | | process_definition_id | 工作流定义ID | | process_node_id | 工作流节点ID | | lineage_type | 血缘类型(1输入,2输出) | | lineage_union_key | 血缘唯一KEY | | create_time | 创建时间 | -------------------- DolphinScheduler(Incubator) Commtter Hemin Wen 温合民 [email protected]<mailto:[email protected]> --------------------
